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Abstract 

This  paper  is  concerned  with  the  construction  and  analysis  of  wavelet-based 
adaptive  algorithms  for  the  numerical  solution  of  elliptic  equations.  These  algo¬ 
rithms  approximate  the  solution  u  of  the  equation  by  a  linear  combination  of  N 
wavelets.  Therefore,  a  benchmark  for  their  performance  is  provided  by  the  rate  of 
best  approximation  to  u  by  an  arbitrary  linear  combination  of  N  wavelets  (so  called 
A-term  approximation),  which  would  be  obtained  by  keeping  the  N  largest  wavelet 
coefficients  of  the  real  solution  (which  of  course  is  unknown) .  The  main  result  of  the 
paper  is  the  construction  of  an  adaptive  scheme  which  produces  an  approximation 
to  u  with  error  0(A“®)  in  the  energy  norm,  whenever  such  a  rate  is  possible  by 
A-term  approximation.  The  range  of  s  >  0  for  which  this  holds  is  only  limited  by 
the  approximation  properties  of  the  wavelets  together  with  their  ability  to  compress 
the  elliptic  operator.  Moreover,  it  is  shown  that  the  number  of  arithmetic  opera¬ 
tions  needed  to  compute  the  approximate  solution  stays  proportional  to  A.  The 
adaptive  algorithm  applies  to  a  wide  class  of  elliptic  problems  and  wavelet  bases. 
The  analysis  in  this  paper  puts  forward  new  techniques  for  treating  elliptic  problems 
as  well  as  the  linear  systems  of  equations  that  arise  from  the  wavelet  discretization. 

AMS  subject  classification:  41A25,  41A46,  65F99,  65N12,  65N55. 

Key  Words:  Elliptic  operator  equations,  quasi  sparse  matrices  and  vectors,  best 
A-term  approximation,  fast  matrix  vector  multiplication,  thresholding,  adaptive 
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1  Introduction 

1.1  B  ackgr  ound 

Adaptive  methods,  such  as  adaptive  Finite  Elements  Methods  (FEM),  are  frequently  used 
to  numerically  solve  elliptic  equations  when  the  solution  is  known  to  have  singularities. 

*This  work  has  been  supported  in  part  by  the  Deutsche  Forschungsgemeinschaft  grants  Da  117/8-2, 
the  Office  of  Naval  Research  Contract  N0014-91-J1343  and  the  TMR  network  “Wavelets  in  Numerical 
Simulation” 
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A  typical  algorithm  uses  information  gained  during  a  given  stage  of  the  computation  to 
produce  a  new  mesh  for  the  next  iteration.  Thus,  the  adaptive  procedure  depends  on  the 
current  numerical  resolution  of  u.  Accordingly,  these  methods  produce  a  form  of  nonlinear 
approximation  of  the  solution,  in  contrast  with  linear  methods  in  which  the  numerical 
procedure  is  set  in  advance  and  does  not  depend  on  the  solution  to  be  resolved. 

The  motivation  for  adaptive  methods  is  that  they  provide  flexibility  to  use  hner  resolu¬ 
tion  near  singularities  of  the  solution  and  thereby  improve  on  the  approximation  efficiency. 
Since  the  startling  papers  [2,  3]  the  understanding  and  practical  realization  of  adaptive 
rehnement  schemes  in  a  hnite  element  context  has  been  documented  in  numerous  pub¬ 
lications  [3,  4,  5,  10,  31].  A  key  ingredient  in  most  adaptive  algorithms  are  a-posteriori 
error  estimators  or  indicators  derived  from  the  current  residual  or  the  solution  of  local 
problems.  They  consist  of  local  quantities  such  as  jumps  of  derivatives  across  the  interface 
between  adjacent  triangles  or  simplices.  One  often  succeeds  in  bounding  the  (global)  error 
of  the  current  solution  with  respect  to  the  energy  norm,  say,  by  sums  of  these  quantities 
from  below  and  above.  Thus  rehning  the  mesh  where  these  local  quantities  are  large  is 
hoped  to  reduce  the  bounds  and  hence  the  error  in  the  next  computation.  Computational 
experience  frequently  conhrms  the  success  of  such  techniques  for  elliptic  boundary  value 
problems  in  the  sense  that  adaptively  generated  highly  nonuniform  meshes  indeed  give 
rise  to  an  accuracy  that  would  require  the  solution  of  much  larger  systems  of  equations 
based  on  uniform  rehnements.  However,  on  a  rigorous  level  the  quantitative  gain  of  adap¬ 
tive  techniques  is  usually  not  clear.  The  central  question  is  whether  the  mesh  rehnements 
actually  result,  at  each  step,  in  some  fixed  error  reduction.  To  our  knowledge  only  in  [30] 
convergence  of  an  adaptive  scheme  has  been  established  for  a  rather  special  case  namely  a 
piecewise  linear  hnite  element  discretization  of  the  classical  Dirichlet  problem  for  Laplace’s 
equation.  There  is  usually  no  rigorous  proof  of  the  overall  convergence  of  such  schemes 
unless  one  assumes  some  quantitative  information  such  as  the  saturation  property  about 
the  unknown  solution  [10].  Saturation  properties  are  assumed  but  not  proven  to  hold. 

Moreover,  the  derivation  of  error  indicators  in  conventional  discretizations  hinges  on 
the  locality  of  diherential  operators.  Additional  difficulties  are  therefore  encountered  when 
considering  elliptic  operators  with  nonlocal  Schwartz  kernel  arising,  for  instance,  in  con¬ 
nection  with  boundary  integral  equations. 

In  summary  there  seem  to  be  at  least  two  reasons  for  this  state  of  ahairs:  (i)  There 
is  an  inherent  difficulty  even  for  local  operators  in  utilizing  the  information  available  at 
a  given  stage  in  the  adaptive  computation  to  guarantee  that  a  suitable  reduction  will 
occur  in  the  residual  error  during  the  next  adaptive  step,  (ii)  Finite  element  analysis  is 
traditionally  based  on  Sobolev  regularity  (see  e.g.  [11]  or  [12])  which  is  known  to  govern 
the  performance  of  linear  methods.  Only  recent  developments  in  the  understanding  of 
nonlinear  methods  have  revealed  that  Besov  regularity  is  a  decidedly  different  and  more 
appropriate  smoothness  scale  for  the  analysis  of  adaptive  schemes,  see  e.g.  [26]. 

In  view  of  the  signihcant  computational  overhead  and  severe  complications  caused  by 
handling  appropriate  data  structures  for  adaptive  schemes,  not  only  guaranteeing  conver¬ 
gence  but  above  all  knowing  its  speed  is  of  paramount  importance  for  deciding  whether 
or  under  which  circumstances  adaptive  techniques  actually  pay  off.  To  our  knowledge 
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nothing  is  known  so  far  about  the  actual  rate  of  convergence  of  adaptive  FEM  solvers  by 
which  we  mean  the  relation  between  the  accuracy  of  the  approximate  solution  and  the 
involved  degrees  of  freedom,  or  better  yet  the  number  of  arithmetic  operations. 

1.2  Wavelet  methods 

An  alternative  to  FEM  are  wavelet  based  methods.  Similarily  to  mesh  refinement  in  FEM, 
these  methods  offer  the  possibility  to  compress  smooth  functions  with  isolated  singulari¬ 
ties,  into  high-order  adaptive  approximations  involving  a  small  number  of  basis  functions. 
In  addition,  it  has  been  recognized  for  some  time  [8]  that  for  a  large  class  of  operators 
(including  integral  operators)  wavelet  bases  give  rise  to  matrix  representations  that  are 
quasi-sparse  (see  §§2-3  for  a  definition  of  quasi-sparse)  and  admit  simple  diagonal  pre¬ 
conditioners  in  the  case  of  elliptic  operators.  Therefore,  it  is  natural  to  develop  adaptive 
strategies  based  on  wavelet  discretizations  in  order  to  solve  numerically  elliptic  operator 
equations. 

The  state  of  wavelet-based  solvers  is  still  in  its  infancy,  and  certain  inherent  imped¬ 
iments  to  their  numerical  use  remain.  These  are  mainly  due  to  the  difficulty  of  dealing 
with  realistic  domain  geometries.  Nevertheless,  these  solvers  show  great  promise,  espe¬ 
cially  for  adaptive  approximation  (see  e.g. [1,  9,  13,  15,  20])  .  Most  adaptive  strategies 
exploit  the  fact  that  wavelet  coefficients  convey  detailed  information  on  the  local  regular¬ 
ity  of  a  function  and  thereby  allow  the  detection  of  its  singularities.  The  rule  of  thumb  is 
that  wherever  wavelet  coefficients  of  the  currently  computed  solution  are  large  in  modulus 
additional  refinements  are  necessary.  In  some  sense,  this  amounts  to  using  the  size  of  the 
computed  coefficients  as  local  a-posteriori  error  indicators.  Note  that  here  refinement 
has  a  somewhat  different  meaning  than  in  the  finite  element  setting.  There  the  adapted 
spaces  result  from  refining  a  mesh.  The  mesh  is  the  primary  controlling  device  and  may 
create  its  own  problems  (of  geometric  nature)  that  have  nothing  to  do  with  the  underlying 
analytic  task.  In  the  wavelet  context  refinement  means  to  add  suitably  selected  further 
basis  functions  to  those  that  are  used  to  approximate  the  current  solution.  We  refer  to 
this  as  space  refinement. 

In  spite  of  promising  numerical  performances,  the  problem  remains  (as  in  the  finite 
element  context)  to  quantify  these  strategies,  that  is,  to  decide  which  and  how  many 
additional  wavelets  need  to  be  added  in  a  refinement  step  in  order  to  guarantee  a  fixed 
error  reduction  rate  at  the  next  resolution  step.  An  adaptive  wavelet  scheme  based  on 
a-posteriori  error  estimators  has  been  recently  developed  in  [17],  which  ensures  this  fixed 
error  reduction  for  a  wide  class  of  elliptic  operators  including  those  of  negative  order. 
This  shows  that  making  use  of  the  characteristic  features  of  wavelet  expansions,  such  as 
the  sparsification  and  preconditioning  of  elliptic  operators,  allows  one  to  go  beyond  what 
is  typically  known  in  the  conventional  framework  of  adaptive  FEM.  However,  similar  to 
FEM,  there  are  so  far  no  results  about  the  rate  of  convergence  of  adaptive  wavelet  based 
solvers,  i.e.,  the  dependence  of  the  error  on  the  number  of  degrees  of  freedom. 
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1.3  The  objectives 

The  purpose  of  the  present  paper  is  twofold.  Firstly,  we  provide  analytical  tools  that 
can  be  utilized  in  studying  the  theoretical  performance  of  adaptive  algorithms.  Secondly, 
we  show  how  these  tools  can  be  used  to  construct  and  analyze  wavelet  based  adaptive 
algorithms  which  display  optimal  approximation  and  complexity  properties  in  the  sense 
that  we  describe  below. 

The  adaptive  methods  we  analyze  in  this  paper  take  the  following  form.  We  assume 
that  we  have  in  hand  a  wavelet  basis  {t^AjAev  to  be  used  for  numerically  resolving  the 
elliptic  equation.  Our  adaptive  scheme  will  iteratively  produce  finite  sets  Kj  C  V,  j  = 
1,  2, . . .,  and  the  Galerkin  approximation  to  u  from  the  space  Sp^.  :=  span({'^A}AeAj)- 
The  function  ua  is  a  linear  combination  of  Nj  :=  wavelets.  Thus  the  adaptive 
method  can  be  viewed  as  a  particular  form  of  nonlinear  W-term  wavelet  approximation  and 
a  benchmark  for  the  performance  of  such  an  adaptive  method  is  provided  by  comparison 
with  best  N -term  approximation  (in  the  energy  norm)  when  full  knowledge  of  u  is  available. 

Much  is  known  about  W-term  approximation.  In  particular,  there  is  a  characterization 
of  the  functions  v  that  can  be  approximated  in  the  energy  norm  with  accuracy  0{N~^)  by 
using  linear  combinations  of  N  wavelets.  As  we  already  mentioned  this  class  is  typically 
a  Besov  space,  which  is  substantially  larger  than  the  corresponding  Sobolev  space 
which  ensures  0{N~^)  accuracy  for  uniform  discretization  with  N  parameters.  In  several 
instances  of  the  elliptic  problems,  e.g.  when  the  right  hand  side  /  has  singularities,  or 
when  the  boundary  of  G  has  corners,  the  Besov  regularity  of  the  solution  will  exceed 
its  Sobolev  regularity  (see  [16]  and  [18]).  So  these  solutions  can  be  approximated  better 
by  best  W-term  approximation  than  by  uniformly  refined  spaces  and  the  use  of  adaptive 
methods  is  suggested.  Another  important  feature  of  W-term  approximation  is  that  a 
near  best  approximation  is  produced  by  thresholding,  i.e.,  simply  keeping  the  N  largest 
contributions  (measured  in  the  same  metric  as  the  approximation  error)  of  the  wavelet 
expansion  of  v. 

Of  course,  since  best  W-term  approximation  requires  complete  information  on  the 
approximated  function  it  cannot  be  applied  directly  to  the  unknown  solution.  It  is  cer¬ 
tainly  not  clear  beforehand  whether  at  all  a  concrete  numerical  scheme  can  produce  at 
least  asymptotically  the  same  convergence  rate.  Thus  ideally  an  optimal  adaptive  wavelet 
algorithm  should  produce  a  similar  result  as  thresholding  the  exact  solution.  In  more 
quantitative  terms  this  means  whenever  the  solution  u  is  in  the  approximations  up^. 
should  satisfy 

h-«A,||  <  C\\u\\B^N--f  Nj  :=  #A„  (1.1) 

where  ||  ■  ||  is  the  energy  norm  and  ||  ■  is  the  norm  for  B‘^.  Since  in  practice  one  is  mostly 
interested  in  controlling  a  prescribed  accuracy  with  a  minimal  number  of  parameters,  we 
shall  rather  say  that  the  adaptive  algorithm  is  of  optimal  order  s  >  0  if  whenever  the 
solution  u  is  in  B^,  then  for  all  e  >  0,  there  exists  j{e)  such  that 

\\u-UA^\\<e,  j>i{e),  (1.2) 
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and  such  that 


#(A,(.))  <  C\\u\\ti-^'‘.  (1.3) 

Such  a  property  ensures  an  optimal  memory  size  for  the  description  of  the  approximate 
solution. 

Another  crucial  aspect  of  the  adaptive  algorithms  is  their  computational  complexity: 
we  shall  say  that  the  adaptive  algorithm  is  computationally  optimal  if,  in  addition  to 
((1.2)-(1.3)),  the  number  of  arithmetic  operation  needed  to  derive  is  proportional  to 
T^Aj.  Note  that  an  instance  of  computational  optimality  in  the  context  of  linear  methods 
is  provided  by  the  full  multigrid  algorithm  when  N  represents  the  number  of  unknowns 
necessary  to  achieve  a  given  accuracy  on  a  uniform  grid.  We  are  thus  interested  in 
algorithms  that  exhibit  the  same  type  of  computational  optimality  with  respect  to  an 
optimal  adaptive  grid  which  is  not  known  in  advance  and  should  itself  be  generated  by 
the  algorithm. 

The  main  accomplishment  of  this  paper  is  the  development  of  an  adaptive  numerical 
scheme  which  for  a  wide  class  of  operator  equations  (including  those  of  negative  order)  is 
optimal  with  regard  to  best  iV-term  approximation  and  also  computationally  optimal  in 
the  above  sense. 

1.4  Organization  of  the  paper 

In  §2,  we  introduce  the  general  setting  of  elliptic  operator  equations  where  our  results 
apply.  In  this  context,  after  applying  a  diagonal  preconditioner,  wavelet  discretizations 
allow  us  to  view  the  equation  as  a  discrete  well  conditioned  £2  linear  system. 

In  §  3,  we  review  certain  aspects  of  nonlinear  approximation,  quasi-sparse  matrices  and 
fast  multiplication  using  such  matrices.  The  main  result  of  this  section  is  an  algorithm 
for  the  fast  computation  of  the  application  of  a  quasi-sparse  matrix  to  a  vector. 

In  §  4,  we  analyze  the  rate  of  convergence  of  the  rehnement  procedure  introduced 
earlier  in  [17].  We  will  refer  to  this  scheme  here  as  Algorithm  I.  We  show  that  this 
algorithm  is  optimal  for  a  small  range  of  s  >  0.  However,  the  full  range  of  optimality 
should  be  limited  only  by  the  properties  of  the  wavelet  basis  (smoothness  and  vanishing 
moments)  and  the  operator  which  is  not  the  case  for  Algorithm  I.  The  analysis  in  §  4 
identihes  however  the  barrier  that  keeps  Algorithm  I  from  being  optimal  in  the  full  range 
of  s. 

In  §  5,  we  introduce  a  second  strategy  -  Algorithm  II  -  for  adaptively  generating  the 
sets  Aj  that  is  shown  to  provide  optimal  approximation  of  order  s  >  0  for  the  full  range 
of  s.  The  new  ingredient  that  distinguishes  Algorithm  II  from  Algorithm  I  is  the  addition 
of  thresholding  steps  which  delete  some  indices  from  Aj.  This  would  be  the  analogue  of 
coarsening  the  mesh  in  FEM. 

Although  we  have  qualihed  so  far  both  procedures  in  §  4  and  §  5  as  “algorithms” ,  we 
have  actually  ignored  any  issue  concerning  practical  realization.  They  are  idealized  in 
the  sense  that  the  exact  assessment  of  residuals  and  Galerkin  solutions  is  assumed.  This 
was  done  in  order  to  identify  clearly  the  essential  analytical  tasks.  Practical  realizations 
require  truncations  and  approximations  of  these  quantities.  §  6  is  devoted  to  developing 
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the  ingredients  of  a  realistic  numerical  scheme.  This  includes  quantitative  thresholding 
procedures,  approximate  matrix/vector  multiplication,  approximate  Galerkin  solvers  and 
the  approximate  evaluation  of  residuals. 

In  §  7  we  employ  these  ingredients  to  formulate  a  computable  version  of  Algorithm 
II  which  is  shown  to  be  computationally  optimal  for  the  full  range  of  s.  Recall  that 
this  means  that  it  realizes  for  this  range  the  order  of  best  A^-term  approximation  at  the 
expense  of  a  number  of  arithmetic  operations  that  stays  proportional  to  the  number  N 
of  significant  coefficients.  Computational  optimality  hinges  to  a  great  extent  on  the  fast 
approximate  matrix/vector  multiplication  from  §  3. 

It  should  be  noted  however  that  an  additional  cost  in  our  wavelet  adaptive  algorithm 
is  incurred  by  sorting  the  coefficients  in  the  currently  computed  solution.  This  cost  at 
stage  j  is  of  order  A^log  where  N  =  t^Aj,  thus  slightly  larger  than  the  cost  in  arithmetic 
operations.  It  should  be  stressed  that  the  complexity  of  the  algorithm  is  analysed  under 
the  assumption  that  the  solution  exhibits  a  certain  rate  of  best  A^-term  approximation 
which  is,  for  instance,  implied  by  a  certain  Besov  regularity.  The  algorithm  itself  does 
not  require  any  a-priori  assumption  of  that  sort. 

We  have  decided  to  carry  out  the  (admittedly  more  technical)  analysis  of  the  numerical 
ingredients  in  some  detail  in  order  to  substantiate  our  claim  that  the  optimality  analysis 
is  not  based  on  any  hidden  assumptions  (beyond  those  hypotheses  that  are  explicitly 
stated)  such  as  accessing  infinitely  many  data.  Nevertheless  the  main  message  of  this 
paper  can  be  read  in  §  4  and  §  5:  optimal  adaptive  approximations  of  elliptic  equations 
can  be  computed  by  iterative  wavelet  refinements  using  a-posteriori  error  estimators, 
provided  that  the  computed  solution  is  regularly  updated  by  appropriate  thresholding 
procedures.  This  fact  was  already  suggested  by  numerical  experiments  in  [15]  that  show 
similar  behavior  between  the  numerical  error  generated  by  such  adaptive  algorithms  and 
by  thresholding  the  exact  solution. 

2  The  Setting 

In  this  section,  we  shall  introduce  the  setting  in  which  our  results  apply.  In  essence, 
our  analysis  applies  whenever  the  elliptic  operator  equation  takes  place  on  a  manifold  or 
domain  which  admits  a  biorthogonal  wavelet  basis. 

2.1  Ellipticity  Assumptions 

This  subsection  gives  the  assumptions  we  make  on  the  operator  equation  to  be  numerically 
solved.  These  assumptions  are  quite  mild  and  apply  in  great  generality. 

Let  C  denote  a  bounded  open  domain  in  the  Euclidean  space  IR'^  with  Lipschitz  bound¬ 
ary  or,  more  generally,  a  Lipschitz  manifold  of  dimension  d.  In  particular,  C  could  be 
a  closed  surface  which  arises  as  a  domain  for  a  boundary  integral  equation.  The  space 
L2(C)  consists  of  all  square  integrable  functions  on  C  with  respect  to  the  (canonically 
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induced)  Lebesgue  measure.  The  corresponding  inner  product  is  denoted  by 

{■,-)L2{n)-  (2.1) 

Let  A  be  a  linear  operator  mapping  a  Hilbert  space  H  into  H*  (its  dual  relative  to 
the  pairing  (•,  •)l2(0))  where  iL  is  a  space  with  the  property  that  either  H  or  its  dual  H* 
is  embedded  in  L2(ll).  The  operator  A  induces  the  bilinear  form  a  defined  on  H  x  H  by 

a{u,v)  :=  {Au,v),  (2.2) 

where  (•,  •)  denotes  the  {H*,H)  duality  product. 

(Al):  We  assume  that  the  bilinear  form  a  is  symmetric  positive  definite  and  elliptic 
in  the  sense  that 

a(n, n)  ~  ||n||^,  v  E  H.  (2.3) 

Here,  and  throughout  this  paper,  ~  means  that  both  quantities  can  be  uniformly  bounded 
by  constant  multiples  of  each  other.  Likewise  <  indicates  inequalities  up  to  constant 
factors. 

It  follows  that  H  is  also  a  Hilbert  space  with  respect  to  the  inner  product  a  and  that 
this  inner  product  induces  an  equivalent  norm  (called  the  energy  norm)  on  H  by 


II -11^=  «(•,•)•  (2.4) 

By  duality,  A  thus  defines  an  isomorphism  from  H  onto  H*.  We  shall  study  the  equation 

Au  =  f  (2.5) 

with  /  G  H*.  From  our  assumptions,  it  follows  that  for  any  /  G  H*,  this  equation  has  a 
unique  solution  in  H,  which  will  always  be  denoted  by  u.  This  is  also  the  unique  solution 
of  the  variational  equation 


a{u,v)  =  {f,v),  for  all  v  E  H.  (2.6) 

The  typical  examples  included  in  the  above  assumptions  are  Poisson’s,  Helmholtz  or 
the  biharmonic  equations  on  bounded  domains  in  IR'^;  single  or  double  layer  potentials 
and  hypersingular  operators  on  closed  surfaces  arising  in  the  context  of  boundary  integral 
equations.  In  these  examples  H  is  a  Sobolev  space,  e.g.  H  =  Hq{Q),  Hq{Q),  ot  H  = 
iL“^/^(f2);  see  [19,  17,  36]  for  examples. 

2.2  Wavelet  Assumptions 

By  now  wavelet  bases  are  available  for  various  types  of  domains  that  are  relevant  for 
the  formulation  of  operator  equations.  This  covers,  for  instance,  polyhedral  surfaces  of 
dimension  two  and  three  [24]  as  well  as  manifolds  or  domains  that  can  be  represented  as 
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a  disjoint  union  of  smooth  regular  parametric  images  of  a  simple  parameter  domain  such 
as  the  unit  cube  [22], 

There  are  many  excellent  accounts  of  wavelets  on  JR'^  (see  e.g.  [33]  or  [25]).  For  the 
construction  and  description  of  wavelet  bases  on  domains  and  manifolds,  we  refer  the 
reader  to  the  survey  paper  [19]  and  the  references  therein.  This  survey  also  sets  forth 
the  notation  we  shall  employ  below  for  indexing  the  elements  in  a  wavelet  basis.  To 
understand  this  notation,  it  may  be  useful  for  the  reader  to  keep  in  mind  the  case  of 
wavelet  bases  on  IR'^.  In  this  setting,  a  typical  biorthogonal  wavelet  basis  of  compactly 
supported  functions  is  given  by  the  shifted  dilates  of  a  set  T  of  2“^  —  1  functions.  Namely, 
the  collection  of  functions 

.-k),  j  e^,  ke  7  e  T,  (2.7) 

form  a  Riesz  basis  for  L2{1R'^).  The  dual  basis  is  given  by 

2^<i/^{2^  .-k),  j  e^,  ke  7  e  f,  (2.8) 

with  T  again  a  set  of  2“^  —  1  functions.  The  integer  j  gives  the  dyadic  level  (2-^  the 
frequency)  of  the  wavelet.  The  multiinteger  k  gives  the  position  {2~^k)  of  the  wavelet. 
Namely,  the  wavelet  has  support  contained  in  a  cube  of  diameter  <  2~^  centered  at  the 
point  2~^k.  Note  that  there  are  2'^  —  1  functions  with  the  same  dyadic  level  j  and  position 
2-%. 

Another  way  to  construct  a  wavelet  basis  for  is  to  start  the  multiscale  decompo¬ 
sition  at  a  finite  dyadic  level  jo.  In  this  case,  the  basis  consists  of  the  functions  of  (2.7) 
with  j  >  jo,  together  with  a  family  of  functions 

2iod/20('2io  .  ^2.9) 

with  (j)  a  fixed  (scaling)  function.  Wavelet  bases  for  domains  take  a  similar  form  except 
that  some  alterations  are  necessary  near  the  boundary. 

We  shall  denote  wavelet  bases  by  {'^AjAev-  In  fhe  particular  case  above,  this  notation 
incorporates  the  three  parameters  j, /c,7  into  the  one  A.  We  use  |A|  :=  j  to  denote  the 
dyadic  level  of  the  wavelet.  We  let  =  {'ijx  :  A  G  V^},  Vj  :=  {A  G  V  ;  |A|  =  j},  consist 
of  the  wavelets  at  level  j. 

In  all  classical  constructions  of  compactly  supported  wavelets,  there  exists  fixed  con¬ 
stants  C  and  M  such  that  diam(supp('?/^A))  <  and  such  that  for  all  A  G  V j  there 

are  at  most  M  indices  p  G  Vj  such  that  meas(supp(V'A)  Fl  supp('?/^^))  7^  0. 

Since  we  shall  consider  only  bounded  domains  in  this  paper,  the  wavelet  decomposition 
will  begin  at  some  fixed  level  jo.  For  notational  convenience  only,  we  assume  jo  =  1.  We 
define  Tq  to  be  the  set  of  scaling  functions  in  the  wavelet  basis.  We  shall  assume  that  hi 
is  a  domain  or  manifold  which  admits  two  sets  of  functions: 

T  =  {V^A  :  A  G  V}  C  L2{n),  T  =  :  A  G  V}  C  ^2(1^)  (2.10) 

that  form  a  biorthogonal  wavelet  bases  on  hi:  writing  (0,  <F)  :=  ((^1,  0)L2(n))6»e0,</)e<J>  for 
any  two  collections  0,  <F  of  functions  in  L2(hl),  one  has 

(T,'F)=I, 
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(2.11) 


where  I  is  the  identity  matrix. 

A  typical  feature  in  the  theory  of  biorthogonal  bases  is  that  the  sequences  d/,  d'  are 
Riesz-bases.  That  is,  using  the  shorthand  notation  :=  ZIasv one  has 

l|d||£2(v)  ~  ~  Ild^^lUzlo)-  (2-12) 

This  property  means  that  the  wavelet  bases  characterize  L2{Vt).  In  the  present  context  of 
elliptic  equations,  we  shall  not  need  (2.12)  but  rather  that  these  bases  provide  a  charac¬ 
terization  of  H  and  H*  in  terms  of  wavelet  coefficients.  This  is  expressed  by  the  following 
specific  assumption. 

(A2):  Let  the  energy  space  H  be  equipped  with  the  norm  ||  ■  ||_f/  and  its  dual  space  H*  be 
equipped  with  the  norm  ||n||H*  :=  supn^H^^j^  |(n,tc)|.  We  assume  that  the  wavelets  in  T 
are  in  H ,  whereas  those  in  T  are  in  H*  {in  this  context,  we  can  assume  that  (2.11)  simply 
holds  in  the  sense  of  the  duality  {H,H*)).  We  assume  that  each  v  E  H  has  a  wavelet 
expansion  v  =  {with  coordinates  d\  =  {v,'ip\))  and  that 

l|D  M||£2(v)  ~  lld^thllif.  (2-13) 


with  D  a  fixed  positive  diagonal  matrix. 

Observe  that  (2.13)  implies  that  Da,a  ~  II'0a||h\  and  that  T  (resp.  Zl^^T)  is  an  un¬ 
conditional  (resp.  Riesz)  basis  for  H.  By  duality,  one  easily  obtains  that  each  v  G  H*  has 
a  wavelet  expansion  v  =  d^T  (with  coordinates  d\  =  (n,  V’a))  that  satisfies 

l|Dd||^2(v)  ~  ||d^T||H*-  (2-14) 

One  should  keep  in  mind  though  that  T  is  only  needed  for  analysis  purposes.  The  Galerkin 
schemes  to  be  considered  below  only  involve  T  while  T  never  enters  any  computation  and 
need  not  even  be  known  explicitly. 

It  is  well  known  (see  e.g.  [22])  that  wavelet  bases  provide  such  characterizations 
for  a  large  variety  of  spaces  (in  particular  the  Sobolev  and  Besov  spaces  for  a  certain 
parameter  range  which  depends  on  the  smoothness  of  the  wavelets).  In  the  context  of 
elliptic  equations,  H  is  typically  some  Sobolev  space  H^.  In  this  case  (A2)  is  satisfied 
whenever  the  wavelets  are  sufficiently  smooth,  with  Da, a  =  For  instance,  when 

A  =  —A,  one  has  t  =  1. 

2.3  Discretization  and  preconditioning  of  the  elliptic  equation 

Using  wavelets,  we  can  rewrite  (2.5)  as  an  infinite  system  of  linear  equations.  We  take 
wavelet  bases  T  and  T  satisfying  (A2)  and  write  the  unknown  solution  u  =  d^T  and  the 
given  right  hand  side  /  in  terms  of  the  basis  T.  This  gives  the  system  of  equations 

(Avl/,T)^d=(/,vI/)^.  (2.15) 
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The  solution  d  to  (2.15)  gives  the  wavelet  coefficients  of  the  solution  u  to  (2.5). 

An  advantage  of  wavelet  bases  is  that  they  allow  for  trivial  preconditiong  of  the  linear 
system  (2.15).  This  preconditioning  is  given  by  the  matrix  D  of  (A2)  and  results  in  the 
system  of  equations: 

D(AT,T)^DD-M  =  D(/,T)^,  (2.16) 

or  more  compactly, 

Au  =  f,  (2.17) 

where 

A  :=  D(AT,T)^D,  u  :=  D^M,  f  ;=  D(/,  G  £2(V).  (2.18) 

Let  us  briefly  explain  the  effect  of  the  above  diagonal  scaling  with  regard  to  precondition¬ 
ing.  To  this  end,  note  that  by  (Al),  the  matrix  A  is  symmetric  positive  definite.  We 
define  its  associated  bilinear  form  a  by 

a(v,w)  :=  (Av,w)£2(v),  (2.19) 

where  (•,  •)^2(v)  is  th®  standard  inner  product  in  f'2(V),  and  denote  the  norm  associated 
with  this  bilinear  form  by  ||  ■  ||.  In  other  words, 

||vf  ;=a(v,v),  ve4(V).  (2.20) 

Combining  the  ellipticity  assumption  (Al)  together  with  the  wavelet  characterization  of 
H  (A2),  we  obtain  that  ||  ■  ||  and  ||  ■  ||£2(v)  are  equivalent  norms,  i.e.,  there  exist  constants 
Cl,  C2  >0  such  that 

Ci||v||^2^(v)  <  ||vf  <  C2||v||^2^(v).  (2.21) 

It  is  immediate  that  these  constants  are  also  such  that 

ci||v||^2(v)  <  I|Av||^2(v)  <  C2||v||£2(v),  (2.22) 

and 

C2  iv||£2(v)  <  ||A-W||^2(v)  <  criv||£2(v)-  (2.23) 

In  particular,  the  condition  number  k  :=  ||A||||A^^||  of  A  satisfies 

K  <  C2C~[^.  (2.24) 

The  fact  that  the  diagonal  scaling  turns  the  original  operator  into  an  isomorphism  on 
f'2(V)  will  be  a  cornerstone  of  the  subsequent  development.  Denoting  by  ax^y  the  entries 
of  A  and  by  Aa  =  (aA,A')A,A'GA  the  section  of  A  restricted  to  the  set  A,  it  follows  from  the 
positive  definiteness  of  A  that 

IIAaII  <  ||A||,  IIA^i  <  IIA^i,  (2.25) 

and  that  the  condition  numbers  of  the  submatrices  remain  for  any  subset  A  C  V  uniformly 
bounded,  i.e., 

^(Aa)  ;=  ||Aa||||A;^^||  <  k.  (2.26) 
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Finally,  it  is  easy  to  check  that  the  constants  Ci  and  C2  also  provide  the  equivalence 

c\^^h\\  <  I|Av||^2(v)  <  (2.27) 

Here  and  later,  we  adopt  the  following  rule  about  denoting  constants.  We  shall  denote 
constants  which  appear  later  in  our  analysis  by  Ci,  C2,  •  ■  Other  constants,  whose  value  is 
not  so  important  for  us,  will  be  denoted  by  C  or  incorporated  into  the  <  ,  ~  notation. 

A  typical  instance  of  the  above  setting  involves  Sobolev  spaces  H  =  m  which  case 
the  entries  of  the  diagonal  matrix  D  can  be  chosen  as  Of  course,  the  constants 

in  (2.24)  will  then  depend  on  the  relation  between  the  energy  norm  (2.20)  and  the  Sobolev 
norm.  In  some  cases  such  a  detour  through  a  Sobolev  space  is  not  necessary  and  (2.13) 
can  be  arranged  to  hold  for  a  suitable  D  when  ||  ■  ||_ff  already  coincides  with  the  energy 
norm.  A  simple  example  is  Au  =  —eAu  +  u  where  (D)a  _A'  :=  max  {1,  A/e2l^l}5A.  A'  is  an 
appropriate  choice.  In  fact,  (2.13)  will  then  hold  independently  oi  e. 

2.4  Quasi-sparsity  assumptions  on  the  stiffness  matrix 

Another  advantage  of  the  wavelet  basis  is  that  for  a  large  class  of  elliptic  operators, 
the  resulting  preconditioned  matrix  A  exhibits  fast  decay  away  from  the  diagonal.  This 
will  later  be  crucial  with  regard  to  storage  economy  and  efficiency  of  (approximate)  ma¬ 
trix/vector  multiplication. 

Consider  for  example  the  (typical)  case  when  H  is  the  Sobolev  space  of  order  t  or 
its  subspace  Hq.  Then,  for  a  large  class  of  elliptic  operators,  we  have 

<  2“ll^l-l^'ll"(l +  d(A,  A'))"^,  (2.28) 

with  a  >  d/2  and  {3  >  d  and 

d(A,  A')  :=  dist(supp(V’A),  supp(V’A'))-  (2-29) 

The  validity  of  (2.28)  has  been  established  in  numerous  settings  (see  e.g.  [19,  8,  35,  37]). 
Decay  estimates  of  the  form  (2.28)  were  initially  introduced  in  [32]  in  the  context  of 
Littlewood-Paley  analysis.  The  constant  a  depends  on  the  smoothness  of  the  wavelets 
whereas  (3  is  related  to  the  approximation  order  of  the  dual  multiresolution  (resp.  the 
vanishing  moments  of  the  wavelets)  and  the  order  of  the  operator  A.  Estimates  of  the  type 
(2.28)  are  known  to  hold  for  a  wide  range  of  cases  including  classical  pseudo-differential 
operators  and  Calderon-Zygmund  operators  (see  e.g.  [21,  36]).  In  particular,  the  single 
and  double  layer  potential  operators  fall  into  this  category.  We  refer  the  reader  to  [19] 
for  a  full  discussion  of  settings  where  (2.28)  is  valid. 

We  introduce  the  class  of  all  matrices  B  =  (6A,A')A,A'ev  which  satisfy 

\bxM  <  CB2-ll"l-l"'ll'^(l  +  d(A,A'))-^  (2.30) 

with  d{\,  A')  defined  by  (2.29).  We  say  that  a  matrix  B  is  quasi-sparse  if  it  is  in  the  class 
for  some  a  >  d/2  and  {3  >  d.  Properties  of  quasi-sparse  matrices  will  be  discussed 

in  53. 


11 


(A3):  We  assume  that,  for  some  a  >  d/2,  (3  >  d,  the  matrix  A  of  (2.17)  is  in  the 
class  Ao-,13. 

Let  us  note  that  in  the  case  H  =  discussed  earlier,  we  obtain  (2.30)  from  (2.28) 
because  D  =  (2"*l^l(5A,A')A,A'ev- 

2.5  Wavelet  Galerkin  methods 

A  wavelet  based  Galerkin  method  for  solving  (2.5)  takes  the  following  form.  We  choose 
a  finite  set  A  of  wavelet  indices  and  use  the  space  Sa  ■=  spanj'^A  :  A  G  A}  as  our  trial 
and  analysis  space.  The  approximate  Galerkin  solution  ua  from  Sa  is  defined  by  the 
conditions 

a{uA,v)  =  V  e  Sa-  (2.31) 

We  introduce  some  notation  which  will  help  embed  the  finite  dimensional  problem 
(2.31)  into  the  infinite  dimensional  space  1*2 (V).  For  any  set  A  C  V,  we  let 

t'2(A)  :=  {v  =  (nA)Aev  G  ^2(V)  ;  ua  =  0,  A  ^  A}. 

Thus,  we  will  for  convenience  identify  a  vector  with  finitely  many  components  with  the 
sequence  obtained  by  setting  all  components  outside  its  support  to  zero.  Moreover,  let  Pa 
denote  the  orthogonal  projector  from  t'2(V)  onto  f'2(A),  that  is,  Pav  is  simply  obtained 
from  V  by  setting  all  coordinates  outside  A  to  zero. 

Using  the  preconditioning  matrix  D,  (2.31)  is  equivalent  to  the  finite  linear  system 

PaAua  =  PAf,  (2.32) 

with  unknown  vector  ua  G  and  where  A  and  f  refer  to  the  preconditioned  system 

given  in  (2.18).  The  solution  ua  to  (2.32)  determines  the  wavelet  coefficients  of  ua- 
Namely, 

UA  =  (Dua)^T.  (2.33) 

Of  course,  coefficients  corresponding  to  A  ^  A  are  zero. 

We  shall  work  almost  exclusively  in  the  remainder  of  this  paper  with  the  preconditioned 
discrete  system  (2.17).  Note  that  the  solution  ua  to  (2.32)  can  be  viewed  as  its  Galerkin 
approximation.  In  turn,  it  has  the  property  that 

||u-ua||=  inf  ||u-v||.  (2.34) 

ve^2  (A) 

Our  problem  then  is  to  find  a  good  set  of  indices  A  such  that  the  Galerkin  solution 
Ua  G  ^(A)  is  a  good  approximation  to  u.  In  view  of  the  equivalences  (see  (2. 21), (2. 3), 
(2.20)) 

I|m  -  kaIIh  ~  l|t<  -  maIU  ~  l|u  -  U4|1(,(V)  ~  l|u  -  uaII,  (2.35) 

any  estimate  for  the  error  ||u  —  ua||  translates  into  an  estimate  for  how  well  the  Galerkin 
solution  Ua  from  the  wavelet  space  Sa  approximates  u. 
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3  A^-term  Approximation  and  Quasi-Sparse  Matrices 

We  have  seen  in  the  previous  section  how  the  problem  of  finding  Galerkin  solutions  to 
u  from  the  wavelet  space  S'a  is  equivalent  to  finding  Galerkin  approximations  to  u  from 
the  sequence  spaces  f'2(A).  This  leads  us  to  understand  first  what  properties  of  a  vector 

V  G  f'2(V)  determine  its  approximability  from  the  spaces  f'2(A).  It  turns  out  that  this  is 
a  simple  and  well  understood  problem  in  approximation  theory  which  we  now  review. 

3.1  A-term  Approximation 

In  this  subsection,  we  want  to  understand  the  properties  of  u  that  determine  its  approx- 
imability  by  a  ua  with  A  of  small  cardinality.  This  is  a  special  case  of  what  is  called 
Wterm  approximation  which  is  completely  understood  in  our  setting.  We  shall  recall  the 
simple  results  in  this  subject  that  are  pertinent  to  our  analysis. 

For  each  iV  =  1,2, . . .,  let  Eat  :=  □{^(A)  :  #A  <  N}.  Thus,  Eat  is  the  (nonlinear) 
subspace  of  1*2  (V)  consisting  of  all  vectors  with  at  most  N  nonzero  coordinates.  Given 

V  G  f'2(V),  V  =  (nA)Aev,  we  introduce  the  error  of  approximation 

(jAr(v)  :=  inf  ||v  -  w|1^2(v)-  (3.1) 

A  best  approximation  to  v  from  Eat  is  obtained  by  taking  a  set  A  with  <  N  on 
which  I^aI  takes  its  N  largest  values.  The  set  A  is  not  unique  but  all  such  sets  yield  best 
approximations  from  Eat.  Indeed,  given  such  a  set  A,  we  let  Pav  be  the  vector  in  Eat 
which  agrees  with  v  on  A.  Then 

CTAr(v)  =  ||v  -  Pav||^2(V)- 

We  next  want  to  understand  which  vectors  v  G  ^2(V)  can  be  approximated  efficiently 
by  the  elements  of  Eat.  For  each  s  >  0,  we  let  denote  the  set  of  all  vectors  v  G  f'2(V) 
such  that 

||v|U.  :=  sup(iV  +  l)VAr(v)  (3.2) 

v>o 

is  finite,  where  (Jo(v)  :=  ||n||^2(v)-  Thus  .4.^  consists  of  all  vectors  which  can  be  approxi¬ 
mated  with  order  0{N~^)  by  the  elements  of  Eat. 

It  is  easy  to  characterize  for  any  s  >  0.  For  this  we  introduce  the  decreasing 
rearrangement  v*  of  v.  For  each  n  >  1,  let  n*  be  the  n-th  largest  of  the  numbers  I^aI  and 
let  V*  :=  (n*)^;^.  For  each  0  <  r  <  2,  we  let  i^{V)  denote  the  collection  of  all  vectors 

V  G  ^2(V)  for  which 

|v|£»(v)  :=  supn^Av*  (3.3) 

n>l 

is  finite.  The  space  t')f(V)  is  called  weak  ir  and  is  a  special  case  of  a  Lorentz  seguence 
space.  The  expression  (3.3)  defines  its  quasi-norm  (it  does  not  in  general  satisfy  the 
triangle  inequality).  We  shall  only  consider  the  case  r  <  2  in  this  paper.  In  this  case 
^)f(V)  C  1^2 (V)  and  for  certain  notational  convenience,  we  define 

||v||£™(v)  :=  |v|^»(v)  +  ||v||£2(v)-  (3.4) 
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If  v,w  are  two  sequences,  then 


||v  +  w||£»(v)  <  C'(r)  (||v||£»(v)  +  ||w||£»(v))  ,  (3.5) 

with  C{t)  depending  on  r  when  r  tends  to  zero. 

We  have  v  G  if  and  only  if  n*  <  n  >  1,  and  the  smallest  such  c  is  equal 

to  ||v||£™.  In  other  words,  the  coordinates  of  v  when  rearranged  in  decreasing  order  are 
required  to  decay  at  the  rate  Another  description  of  this  space  is  given  by 

#{A  :  |va|  >  e}  <  ce"^  (3.6) 

and  the  smallest  c  which  satishes  (3.6)  is  equivalent  to 

Remark  3.1  An  alternative  description  of  is 

{v  G  f'2(V)  :  #{A  :  2~^  >  I^aI  >  j  G  ^  for  some  c  <  oo}. 

Moreover  the  smallest  such  c  is  equivalent  to 


We  recall  that  if(V)  contains  f'r(V),  and  we  trivially  have  n{v^y  <  I]n>i  IwT 
therefore 

|v|^»(v)  <  ||v||^^(v),  (3.7) 


Proposition  3.2  Given  s  >  0,  let  r  he  defined  by 

11  .  ^ 
--S  +  -.  (3.9) 

Then  the  sequence  v  belongs  to  if  and  only  if^rE  £f(V)  and 

||v|Us  ~  ||v||^™(v)  (3.10) 

with  constants  of  equivalency  depending  only  on  r  when  r  tends  to  zero  {respectively ,  only 
on  s  when  s  tends  to  oo).  In  particular,  if^E  IfiV),  then 

c^n(v)  <  C'||v||£»(v)n“*,  n  =  l,2,...,  (3.11) 

with  the  constant  C  depending  only  on  r  when  r  tends  to  zero. 
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For  the  simple  proof  of  this  proposition,  we  refer  the  reader  to  [29]  or  the  survey  [26]. 

Conditions  like  u  G  iri^)  or  u  G  are  equivalent  to  smoothness  conditions  on 

the  function  u.  We  describe  a  typical  situation  when  H  =  and  D  =  (2“*l^l5A,A')A,A'ev- 
Then,  the  condition  u  G  irCV)  is  equivalent  to  the  requirement  that  the  wavelet  coeffi¬ 
cients  A  G  V,  satisfy 

Aev  e  f'r(V).  (3-12) 

For  a  certain  range  of  s  (and  hence  r)  depending  on  the  smoothness  and  vanishing  mo¬ 
ments  of  the  wavelet  basis,  the  condition  (3.12)  describes  membership  in  a  certain  Besov 
space.  Namely  for  s  and  r  related  by  (3.9),  we  have 

u  G  4  iff  M  G  Bf+\Lr{n))  (3.13) 

with  Bp{Lp)  the  usual  Besov  space  measuring  “r  orders  of  smoothness  in  Lp  \  The  weaker 
condition  u  G  £^(V)  gives  a  slightly  larger  space  Xt-  endowed  with  the  (quasi)  norm 

ll^llx,  :=  i|u||£»(v)-  (3-14) 

In  view  of  (2.35),  the  space  X'^  consists  exactly  of  those  functions  u  whose  best  Wterm 
wavelet  approximation  in  the  energy  norm  produces  an  error  0(iV“®). 

3.2  Quasi-Sparse  Matrices 

In  this  subsection,  we  shall  consider  some  of  the  properties  of  the  quasi-sparse  matrices  A 
that  appear  in  the  discrete  reformulation  (2.17)  of  the  elliptic  equation  (2.5).  We  recall 
that  such  matrices  A  are  in  the  class  for  some  a  >  d/2,  j3  >  d  and  therefore  they 
satisfy  (2.30) 

We  begin  by  discussing  the  mapping  properties  of  matrices  B  G  We  denote  by 

||B||  the  spectral  norm  of  B.  We  shall  use  the  following  version  of  the  Schur  Lemma:  if 
for  the  matrix  B  =  (6A,A')A,A'ev  there  is  a  sequence  cua  >  0,  A  G  V,  and  a  positive  constant 
c  such  that 

X]  I^a.a'Icua'  <  ccua,  and  ^  \bx,y\‘^>^  <  ccua',  A,  A'  G  V,  (3.15) 

A'ev  Aev 

then  ||B||  <  c.  An  instance  of  the  application  of  this  lemma  to  the  classes  is  the 
following  result  (which  can  be  found  in  [32]). 

Proposition  3.3  If  a  >  d/2  and  {3  >  d  then  every  B  G  Aa^y  defines  a  bounded  operator 
on4(V). 

Proof:  We  apply  Schur’s  lemma  with  the  weights  (jJ\  =  A  G  V.  To  establish 

the  first  inequality  in  (3.15),  let  A  G  V  and  let  |A|  =  j.  Then,  using  the  estimate 
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Xl|A'|=j'(l  + -^0)  ^  ^  for  the  summation  in  space,  we  obtain 


u 


A'ev 


-■  i:  oj,,\b,,,>\  <  j:  i:  (1  +  d(x, xyr” 

i'>f)  lA'hj' 

j'>j  o<j'<j 

<  <oo. 

rsj  /  -J 


l>0 


A  symmetric  argument  confirms  the  second  estimate  in  (3.15)  proving  that  B  is  bounded. 
□ 


While  Proposition  3.3  is  of  general  interest,  it  does  not  tell  us  any  additional  infor¬ 
mation  when  applied  to  the  matrix  A  of  (2.17)  since  our  ellipticity  assumptions  (Al) 
already  implies  that  A  is  bounded  on  f'2(V). 

It  is  well-known  that  decay  estimates  of  the  type  (2.30)  form  the  basis  of  matrix 
compression  [8,  21,  35,  36].  The  following  proposition  employs  a  compression  technique 
which  is  somewhat  different  from  the  results  in  these  papers. 

Proposition  3.4  For  each  a  >  d/2,  [3  >  d  let 

l-l|  (3,16) 


assume  that  B  G  Aa,i3-  Then,  given  any  s  <  s* ,  there  exists  for  every  J  &  IN  a  matrix 
Bj  which  contains  at  most  2'^  nonzero  entries  in  each  row  and  column  and  provides  the 
approximation  efficiency 

||B -BjII  <  C'2-'^",  Jew.  (3.17) 

Moreover  this  result  also  holds  for  s  =  s*  provided  a  —  d/2  ^  j3  —  d. 


Proof:  Let  B  =  (&A,A')A,A'ev  be  in  Aa,y-  We  £x  J  >  0  and  we  first  apply  a  truncation  in 
scale,  defining  Bj  :=  (6A,A')A,A'ev  where 


h\,\'  := 


||A|-|A'||  <  J/d, 

0,  else. 


In  order  to  estimate  the  spectral  norm  ||B  —  Bj||,  we  can  employ  the  Schur  lemma  with 
the  same  weights  as  in  the  proof  of  Proposition  3.3.  As  in  that  proof,  we  obtain,  for  any 
A  e  V  and  |A|  =  j. 


^  “^A'  ^A, A'  —  ^A,A' 

= 

A' 

{A'  :  p-|A'||>J/d} 

< 

rsj 

^  2^-(a-dl2)l 

l>J/d 

< 

2” (o'— d/2)  J/d  ^ 

rsj 
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It  follows  that 


B-Bj||  <  2--^\ 

\  I  r\j 


(3.18) 


We  also  need  a  truncation  in  space  provided  by  the  new  matrix  Bj  ;  = 
where 


b\ 


A. A' 


h,,y,  <i(A,A')<2WW-|VI4(||A|_|A'||), 

0,  else, 


(^A,A')A,A'eV 


and  where  7(n)  is  a  polynomially  decreasing  sequence  such  that  Y^nlij^Y  <  Specifi¬ 
cally,  we  take  7(n)  :=  (1  -|- 

We  can  then  immediately  estimate  the  maximal  number  Nj  of  non- zero  entries  in  each 
row  and  column  of  Bj  by 


[j/d] 

^  <  2-^. 

1=0 

In  view  of  (3.18),  it  remains  only  to  prove  that  ||Bj  —  Bj||  <  2“'^^.  In  order  to 
estimate  the  spectral  norm  ||Bj  —  Bj||,  we  again  use  the  Schur  lemma  with  the  same 
weights.  For  each  j'  and  A  G  A,  we  have 

{A'  :  d(A,A')>R} 

Therefore,  for  any  A  G  V, 

[J/d] 

-'a-'E-'vI7a'-6a,a.|  < 

A'  1=0 

J 

=  2~^'^  \^-J{0-d-ds)/d  ^  2W-d)-{^-d/2)]l 

1=0 

In  the  case  where  {(3  —  d)  <  {a  —  d/2)  (resp.(/5  —  d)  >  [a  —  d/2)),  the  factor  on  the 
right  of  2“*'^  is  bounded  by  (resp.  with  C  a  constant 

independent  of  J  and  A.  Thus,  when  jS  —  d  ^  a  —  d/2,  we  obtain  the  desired  estimate  of 
||Bj  —  Bj||  for  all  s  <  s*.  On  the  other  hand,  when  f3  —  d  =  a  —  d/2,  this  factor  is  still 
bounded  by  a  fixed  constant  provided  s  <  s* .  □ 

Remark  3.5  In  the  case  that  the  matrix's  of  Proposition  3. 4  is  the  preconditioned  matrix 
representation  of  an  elliptic  operator  A  which  is  local  (he.,  supp  Afjx  C  supp'^A?  A  G 
V)  then  the  truncation  in  space  in  the  proof  of  this  proposition  is  not  needed  and  the 
Proposition  holds  for  ds  <  a  —  d/2. 
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3.3  Fast  Multiplication 

We  now  come  to  the  main  result  of  this  section  which  is  the  fast  computation  of  quasi- 
sparse  matrices  applied  to  vectors.  We  continue  to  denote  the  spectral  norm  of  a  matrix 
B  by  ||B||. 

We  have  seen  that  decay  estimates  like  (2.28)  imply  compressibility  in  the  sense  of 
Proposition  3.4.  To  emphasize  that  only  this  compressibility  (which  may  actually  hold 
also  for  other  operators  than  those  discussed  in  connection  with  (2.28))  matters  for  the 
subsequent  analysis  we  introduce  the  following  class  Eg  of  compressible  matrices. 

Definition  3.6  We  say  a  matrix  B  is  in  the  class  Eg  if  there  are  two  positive  sequences 
and  {Pj)j>o  that  are  both  summahle  and  for  every  j  >  0  there  exists  a  matrix  B^ 
with  at  most  2^aj  nonzero  entries  per  row  and  column  such  that 

||B  -  Bjll  <  2-^'*/3^-.  (3.19) 

We  further  define 

||B||g^  :=  minmax  <  ^  aj,  ^  (3,  (3.20) 

[j>o  j>o  ) 

where  the  minimum  is  taken  over  all  such  sequences  (oj)j>o  and  {Pj)j>o- 

We  record  the  following  consequence  of  Proposition  3.4. 

Corollary  3.7  Let  s*  be  defined  by  (3.16).  Then  for  every  0  <  s  <  s*  one  has 

X,/3  C  Eg.  (3.21) 

Note  that  the  sequences  (aj),  {Pj)  can  in  this  case  be  chosen  to  decay  exponentially  and 
that  ||B||g^  grows  when  s  approaches  s* . 

The  main  result  of  this  section  reads  as  follows. 

Proposition  3.8  If  the  matrix  B  is  in  the  class  Eg,  then  B  maps  f')),(V)  boundedly  into 
itself  for  1/r  =  1/2  +  s,  that  is,  for  any  v  G  we  have 

||Bv||^»(v)  <  C'||v||£™(v).  (3.22) 

with  the  constant  C  depending  only  on  ||B||g^  and  the  spectral  norm  ||B||. 

Proof:  Let  v  G  if(V)  and  for  any  j  >  0,  we  denote  by  V[j]  G  S2j  be  a  best  2'^-term 
approximation  to  v  in  ||  ■  ||£2(v)-  We  recall  that  V[j]  is  obtained  by  retaining  the  2^  biggest 
coefficients  of  v  and  setting  all  other  coefficients  to  zero.  Then,  from  Proposition  3.2,  we 
have 

~  ’'^[i]lk2(v)  <  C'2“-^*||v||£™(v)  (3.23) 

with  the  constant  depending  only  on  r.  Using  the  matrices  of  (3.19),  we  define 

Wj  :=  BjV[o]  +  Bj_i(v[i]  -  V[o])  +  ■  ■  ■  +  Bo(v[j]  -  vy_i]).  (3.24) 


18 


This  gives 


Bv  -  Wj  =  B(v  -  V[^-])  +  (B  -  Bo)(v[j]  -  +  ■  ■  ■  +  (B  -  Bj)v[o]. 

It  follows  then  from  the  summability  of  the  f3j  that 

||Bv- Wj||^,(v)  <  ||B||||v- V[^-]||^,(V) 

+  l|B  -Bo||||v[j]  -  v[j_i]||£2(v)  +  ■  ■■  +  ||B  -  Bj||||v[o]||£2(v) 

^  l|B|| ||v||^™(v)2  *-^  +  2  ^/5o||v||^»(v)2  ^^  +  ■  ■  ■  +  2  *'^'/5j||v[o] ||£2(v) 

<  2-*^||v||,»(v),  (3.25) 

where  for  the  last  term,  we  have  used  the  simple  inequalities  ||vo||^2(v)  <  ||vo|l£™(v)  < 

l|v||^»(v)- 

The  number  Nj  of  nonzero  entries  of  wj  is  estimated  by 

Nj  <  ajT  +  2aj_i2^”^  +  ■  ■  ■  +  2^ao  <  2T 

We  apply  now  Proposition  3.2  and  obtain  (3.22).  □ 

We  state  an  immediate  consequence  of  Corollary  3.7. 

Corollary  3.9  The  conclusions  of  Proposition  3.8  hold  for  any  matrix's  G  Aa,i3  provided 
s  <  min  {a/d  —  1/2,  P/d  —  1}  =  s*. 

Note  that  the  number  of  arithmetic  operations  needed  to  compute  wj  in  (3.24)  is 
estimated  as  Nj  above,  so  that  this  multiplication  algorithm  is  optimal.  This  is  stated  in 
the  following  corollary  in  which  we  also  reformulate  our  result  in  terms  of  a  prescribed 
tolerance. 

Corollary  3.10  Under  the  hypotheses  of  Proposition  3.8,  for  each  v  G  and  for 

each  e  >  0,  there  is  a  such  that 

||Bv  -  w,||^2(v)  <  e, 

and 

#suppw,  < 

with  s  and  r  related  as  in  (3.9).  Moreover,  the  approximation  can  be  computed  with 
C^llvll  arithmetic  operations.  In  both  of  these  statements,  the  constants  C  de¬ 

pends  only  on  ||B||g^  and  the  spectral  norm  o/B. 
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4  An  Adaptive  Galerkin  Scheme 

We  have  shown  in  §2,  that  the  elliptic  equation  (2.5)  is  equivalent  to  the  infinite  system 
of  equations  (2.17) 

Au  =  f,  (4.1) 

where  A  is  an  isomorphism  on  1*2 (V).  This  system  results  from  expanding  the  solution 
and  right  hand  side  of  (2.5)  in  a  primal  and  dual  wavelet  basis,  respectively,  and  then 
using  a  diagonal  preconditioning.  We  have  also  noted  in  that  section  that,  for  a  given 
set  A  C  V,  solving  (4.1)  with  trial  space  f'2(A)  is  the  same  as  solving  (2.5)  with  the  trial 
space  Sa- 

We  are  not  only  interested  in  rapidly  solving  the  linear  system  (2.32)  of  equations  for  a 
given  selection  A  of  basis  functions  for  the  trial  space  Sa  but  also  in  adaptively  generating 
possibly  economic  sets  A  needed  to  achieve  a  desired  accuracy.  Since  adaptive  approxima¬ 
tion  is  a  form  of  nonlinear  approximation,  it  is  reasonable  to  benchmark  the  performance 
of  such  an  adaptive  method  against  nonlinear  A-term  approximation  as  discussed  in  §  3. 
We  recall  that  the  results  of  §3.1  show  that  a  vector  v  can  be  approximated  with  order 
0{N~‘^)  by  A-term  approximation  (i.e.,  by  a  vector  with  at  most  A  nonzero  coordinates) 
if  and  only  if  v  G  f')l'(V),  r  :=  (s  -|-  1/2)“^.  We  shall  strive  therefore  to  meet  the  following 
goal. 

Goal:  Construct  an  adaptive  algorithm  so  that  the  following  property  holds  for  a  wide 
range  of  s  >  0.'  for  each  u  G  r  :=  {s  +  1/2)“^,  the  algorithm  generates  sets  Aj, 

j  =  1,2, . . .,  such  that  the  Galerkin  approximation  ua^.  to  u  provides  the  approximation 
error 

||u  -  uajW  <  C'||u|l^»(v)(#Aj)“*.  (4.2) 

Recall  that  this  goal  can  also  be  expressed  in  terms  of  achieving  a  certain  tolerance  with 
an  optimal  number  of  degrees  of  freedom  as  stated  in  (1.2)  and  (1.3). 

In  this  section,  we  shall  describe  a /irsf  adaptive  algorithm,  initially  developed  in  [17], 
for  solving  (4.1).  Starting  with  an  initial  set  Aq,  this  algorithm  adaptively  generates  a 
sequence  of  (nested)  sets  Aj,  j  =  1,2, . . ..  The  Galerkin  solutions  ua^  j  =  1,  2  . . .,  to  (4.1) 
provide  our  numerical  approximation  to  u  and  these  in  turn  determine  our  approximations 
UAj  to  the  solution  u  of  the  original  elliptic  equation  (2.5). 

At  present,  we  can  only  show  that  the  algorithm  of  this  section  meets  our  goal  for 
a  small  range  of  s  >  0  (see  Corollary  4.10).  Nevertheless,  this  algorithm  is  simple  and 
interesting  in  several  respects  and  the  analysis  of  this  algorithm  brings  forward  natural 
questions  concerning  Galerkin  approximations. 

In  §  5  we  shall  present  a  second  adaptive  algorithm  which  will  meet  our  goal  for  a 
natural  range  s*  >  s  >  0.  This  range  of  s  >  0  is  limited  only  by  the  decay  properties 
of  the  stiffness  matrix  A  which  in  turn  are  related  to  properties  of  the  wavelet  basis 
(smoothness  and  vanishing  moments)  and  the  order  of  A. 

The  analysis  we  give  in  this  and  the  following  section  for  these  adaptive  algorithms 
is  idealized  since  it  will  address  only  questions  of  approximation  order  in  terms  of  the 
cardinality  of  the  sets  Aj.  At  this  stage  we  shall  ignore  certain  computational  issues.  In 
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particular,  we  will  assume  that  we  are  able  to  access  the  values  of  possibly  infinitely  many 
wavelet  coefficients,  e.g.  of  residuals,  which  is  of  course  unrealistic.  However,  this  will 
facilitate  a  more  transparent  analysis  of  the  adaptive  algorithms  and  their  ingredients. 
Later  in  §  6-7  we  will  develop  corresponding  computable  counterparts  by  introducing 
suitable  truncation  and  approximation  procedures.  Moreover,  we  will  provide  a  complete 
analysis  of  their  computational  complexity. 

4.1  Algorithm  I 

The  idea  behind  our  first  adaptive  algorithm  is  to  generate  step  by  step  an  ascending 
sequence  of  (nested)  sets  Kj  so  that  on  the  one  hand  #(Aj  \  Aj_i)  stays  as  small  as 
possible,  while  on  the  other  hand,  the  error  for  the  corresponding  Galerkin  solutions  is 
reduced  by  some  fixed  factor,  that  is,  for  some  9  G  (0, 1)  one  has 

||u-ua,^J|  <  d||u-UAj.  (4.3) 

We  remind  the  reader  that  ||  ■  ||  :=  a(-,  ■)^A  jg  ttie  discrete  energy  norm  when  applied 
to  vectors.  The  Aj  will  be  generated  adaptively,  that  is  Aj  depends  on  the  given  data  f 
and  on  the  previous  solution  ua  .j. 

We  will  first  explain  the  basic  principle  that  has  been  already  used  in  [10,  17]  to 
guarantee  a  reduction  of  the  form  (4.3).  The  idea  is,  given  A,  find  A  containing  A  such 
that 

||ua  -  uaII  > /3||u  -  uaII  (4.4) 

holds  for  some  (3  G  (0, 1).  By  the  orthogonality  of  the  Galerkin  solutions  with  respect  to 
the  energy  inner  product,  (4.4)  implies 

||u  -  uaIP  =  ||u  -  +  ||ua  -  U^||A  (4.5) 

Hence  (4.4)  (applied  with  A  =  Aj  and  A  =  A^+i)  implies  (4.3)  with 

9  ;=  ^1  -  /52.  (4.6) 

Therefore,  our  strategy  is  to  establish  (4.4).  This  is  also  a  common  approach  in  the 
context  of  finite  element  discretizations,  see  e.g.  [10].  There  the  role  of  is  played  by 
an  approximate  solution  of  higher  order  or  with  respect  to  a  finer  mesh.  In  most  studies, 
however,  the  property  (4.4),  often  referred  to  as  saturation  property,  is  assumed  and  not 
proven  to  be  valid. 

We  shall  show  how  such  sets  A  can  be  selected.  For  this  we  shall  use  the  residual 

ta  ;=  Au  —  Aua  =  f  —  Aua.  (4.7) 

Since  ua  and  f  are  known  to  us,  the  coordinates  of  this  residual  can  in  principle  be 
computed  to  any  desired  accuracy.  We  leave  aside  the  issue  of  the  computational  cost  for 
a  given  accuracy  in  this  residual  until  §  6,  and  work  with  the  simplified  assumption  that 
we  have  the  exact  knowledge  of  its  coordinates. 

We  recall  the  orthogonal  projector  Pa  from  f'2(V)  to  f'2(A)  in  the  norm  ||  ■  ||£2(v)-  For 
V  G  f'2(V),  Pav  is  the  vector  in  i2{A)  which  agrees  with  v  on  A. 
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Lemma  4.1  Let  A  C  V  and  let  r/^  :=  f  —  Aua  be  the  residual  associated  to  A.  If 
0  <  a  <  1,  and  A  C  V  is  any  set  that  satisfies 


I|Pa’^a||£2(V)  >  «ikA|k2(V), 

then 

||Ua  -  UaII  >  fi\\u  -  UaII 

where  :=  and  Ci,  C2  are  the  constants  of  (2.21).  As  a  consequence, 

||u  -  u^ll  <  ^llu  -  UaII 


(4.8) 

(4.9) 


(4.10) 


with  9  := 


Proof:  From  (2.27),  we  have 


Uj 


Ua 


>  C2  ^/^||A(u^  -  Ua)|1^2(V)  >  C^^^^||A(u^  -  Ua)||^2(a) 
=  C2  ^/^||A(u  -  Ua)||^2(a)  =  C2  ^^^||PArA|k2(V) 

>  C2  ^'^^a||rA||^2(v)  =  C2  ^'^^a||A(u  -  ua)||^2(v) 

>  C2^^^Ci'^^q;||u  -  UaII, 


where  the  second  to  last  equality  uses  the  fact  that  Au  =  f,  and  Au^  agree  on  A.  This 
proves  (4.9)  while  (4.10)  follows  from  (4.5).  □ 


We  consider  now  our  first  algorithm  for  choosing  the  sets  Aj  in  which  we  take  0  =  1/2 
(similar  algorithms  and  analysis  hold  for  any  0  <  o  <  1).  We  introduce  the  following 
steps  which  will  be  part  of  our  adaptive  algorithms. 


GALERKIN:  Given  a  set  A,  GALERKIN  determines  the  Galerkin  approximation 
Ua  to  u  by  solving  the  finite  system  of  equations  (2.32). 

GROW:  Given  a  set  A  and  the  Galerkin  solution  ua,  GROW  produces  the  smallest 
set  A  which  contains  A  and  satisfies 

IIPa'^aII^zIV)  >  (4-11) 

We  note  that  the  set  A  is  obtained  by  taking  the  indices  of  the  largest  coefficients  of  fa; 
the  number  of  these  indices  to  be  chosen  is  determined  by  the  criterion  (4.11). 
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Algorithm  I: 

•  Let  Ao  =  0  and  =  f. 

•  For  j  =  0,1,2,.. .,  determine  Aj+i  from  Aj  by  first  applying  GALERKIN  (in  order 
to  find  UAj  )  and  then  applying  GROW. 

As  a  consequence  of  Lemma  4.1,  we  have  the  following. 

Corollary  4.2  For  the  sets  Aj  given  by  Algorithm  I,  the  corresponding  Galerkin  approx¬ 
imations  ua^.  of  u  satisfy 


||u-Ua^.+J|  <  d||u-UAj,  J  =  1,2,  .... 

where 


Conseguently, 


(4.12) 

(4.13) 


l|u- UajII  <  «^||u||,  j  =  l,2,....  (4.14) 

Proof:  The  inequality  (4.12)  follows  from  (4.10)  while  (4.14)  follows  by  repeatedly  ap¬ 
plying  (4.12).  □ 


4.2  Error  Analysis  for  Algorithm  I 

While  the  last  Corollary  shows  that  for  each  u  G  f'2(V),  the  sequence  {ua^  }  converges  in 
the  energy  norm  to  u,  we  would  like  to  go  further  and  understand  how  the  error  decreases 
with  ffAj.  In  particular,  we  would  like  to  see  if  this  algorithm  meets  our  goal  for  certain 
s  >  0.  We  begin  with  the  following  lemma. 

Lemma  4.3  Let  s  >  0,  let  A  he  in  the  class  Bg,  and  let  u  G  r  :=  {s  -\-  1/2)~L 

Given  any  set  A  G  V ,  let  A  G  V  be  the  smallest  set  such  that  A  G  A  and 

I|Pa’^a|1^2(v)  >  2  (4-15) 

Then  one  has 

where  C3  is  a  constant  depending  only  on  s  when  s  is  large. 

Proof:  We  will  make  frequent  use  of  the  following  simple  fact. 

Remark  4.4  Since  A  is  finite,  ua  is  in  £f(V).  By  assumption  u  G  £f(V)  and  hence 
u  —  ua  is  also  in  Applying  Proposition  3.8  we  see  that  fa  is  also  in  if(V). 
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Now,  for  any  iV  >  1,  let  A^v  denote  the  indices  of  the  N  largest  coefficients  of  fa  in 
absolute  value.  According  to  Proposition  3.2, 

Ikv  —  PA^rAil^aiV)  <  C'o||i'a|1£“(V)A^  (4-17) 

where  Cq  depends  only  on  s  when  s  is  large.  We  may  assume  that  Cq  >  1.  We  choose  N 
as  the  smallest  integer  such  that 

2C'olkA||£“(V)A^  ^  <  IkAll^alV); 

and  define  A  :=  A  U  Aat.  Then,  clearly  (4.15)  is  satisfied.  Moreover, 

^  ^  / 2C'o||rA||^y(v)\  I  2  { 

~  \  lkA|l£2(V)  /  “V  lkA||^2(V)  j 
and  so  (4.16)  is  also  satisfied.  □ 

Lemma  4.3  gives  our  first  hint  of  the  importance  of  controlling  the  f')f(V)  norms  of 
the  residuals  r a^  .  The  following  Theorem  and  Corollary  will  draw  this  out  more  and  will 
provide  our  first  error  estimate  for  Algorithm  I. 

Theorem  4.5  Let  s  >  0,  let  A  he  in  the  class  Eg,  and  let  u  G  r  :=  (s  +  1/2)~L 

Define 

e  := 


with  ci,C2  the  constants  of  ^2.4.  Then,  the  Galerkin  approximations  ua^.,  k  =  0,1,..., 
generated  by  Algorithm  I  satisfy 


||u-UAj|=C^(#Afc)-*, 

(4.18) 

where 

Cl  < 

(4.19) 

and  the  constants  Ck,  k  >  1,  satisfy 

Ck+i  <  9^/%Ck  +  C3cr'/'lrAj|;^(v)) 

(4.20) 

with  C3  the  constant  of  Lemma  4-3. 

Proof:  We  use  the  abbreviations  :=  ||u  —  ua^H,  :=  #Afc,  pk  ■=  ||rAfc||£“(v),  k  = 
0, 1, . . ..  The  constants  Ck,  /c  =  1,  2, . . .,  are  defined  by  (4.18).  For  any  k  >0,  we  know 


and  from  Lemma  4.3  and  (2.27),  we  obtain 

Nk+i  <Nk  +  C3Pfc^lA(u  -  UAjil7/(4")  <Nk  + 
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This  means  that  for  /c  >  1, 

C»+,  :=  iV,+ie;4-i  <  (iV,  +  +  CaC^-'^Vf). 

This  proves  (4.20).  The  same  argument  gives  (4.19)  because  po  =  l|f||£“(v)  and  iVo  =  0. 

□ 


Theorem  4.5  reveals  that  the  growth  of  the  constants  Ck  can  be  controlled  by  the  size 
of  the  residual  norms  ||rAfc  ||£“(v)-  The  following  Corollary  shows  that  if  these  norms  are 
bounded  then  so  are  the  constants  Ck- 

Corollary  4.6  If  the  hypotheses  of  Theorem  f.5  are  valid  and  in  addition 

||i'Afc||£“(v)  <  M),  /i;  =  0, 1,...,  (4.21) 

then 

Ct  <  C(||u||gy,  +  My ),  k  =  l,2...,  (4.22) 

with  C  a  constant  such  that  C®  depends  only  on  s  when  s  — ^  oo.  Consequently, 

||u  -  U^JI  <  C‘(aC  +  l|ul|,4yy(#A,)-‘'*.  (4.23) 

Proof:  We  use  the  same  notation  as  in  the  proof  of  Theorem  4.5.  We  define  M  :  = 
and  find 

Ck  < 

k-l 

<  +  M  ^  dTy 

j=i 

Now,  d  <  1,  and  from  (4.19)  and  Proposition  3.8 

<  ||f||^»(V)  <  ||u||£»(v). 

This  proves  (4.22).  The  estimate  (4.23)  then  follows  from  (4.18).  □ 


Remark  4.7  Corollary  f.6  shows  that  «/ ||rAj.||^»(v)  is  bounded  independently  of  k,  then 
we  are  successful  in  the  goal  that  we  have  fixed  in  the  beginning  of  this  section.  One  can 
also  check  that  optimality  is  achieved  in  the  sense  of  a  target  accuracy  e  >  0.'  Let  j(e) 
be  the  smallest  j  such  that  ||u  —  uaJI  <  e.  Then,  since  ||u  —  ua^.(^)_J|  >  e,  we  obtain 
the  estimate  from  (4.23).  From  (4.16),  we  also  derive  that  #(Aj(e)  \ 

Aj(,;)_i)  <  .  It  follows  that  we  have  the  desired  estimate  . 
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4.3  Bounding  ||rAj|^»(V) 

Corollary  4.6  shows  that  if  for  each  u  G  £^(V),  r  :=  (s+l/2)“^,  the  boundedness  condition 
(4.21)  holds  with  Mq  <  C'||u||^»(v),  then  the  algorithm  meets  our  goal  for  s  =  ^  —  |.  We 
can  give  sufficient  conditions  for  the  validity  of  (4.21)  in  terms  of  the  (hnite)  sections 


-^A  •  (®A,i/) A,!/SA 


(4.24) 


of  the  matrix  A.  Note  that  in  terms  of  these  sections  the  Galerkin  equations  (2.32)  take 
the  form 


AaUa  =  PAf, 


(4.25) 


where  according  to  our  convention  we  always  employ  the  same  notation  for  the  hnitely 
supported  vector  ua  and  the  inhnite  sequence  obtained  by  setting  all  components  outside 
A  to  zero.  Likewise,  depending  on  the  context,  it  will  be  convenient  to  treat  Pav  for 
V  G  f'2(V)  either  as  an  inhnite  sequence  with  zero  entries  outside  A  or  as  a  hnitely 
supported  vector  dehned  on  A. 

Recall  also  from  (2.26)  that  the  ellipticity  of  A  implies  the  boundedness  of  Aa  and  its 
inverse  in  the  spectral  norm,  uniformly  in  A.  Also,  from  Proposition  3.8,  it  follows  that 
A  is  a  bounded  operator  on  ^(^(V).  Therefore,  the  matrices  Aa  are  uniformly  bounded 
(independently  of  A)  on  ^)f(A)  (where  f')f(A)  is  dehned  in  analogy  to 

Remark  4.8  Under  the  assumptions  of  Lemma  f.S,  if  the  inverse  matriees  Af}  are 
uniformly  bounded  on  if  {A),  i.e., 

sup  ||A)^V|1^»(a)  <  Ml,  Ac  V,  (4.26) 

(A)<1 


with  Ml  >  1,  then 

lkA||^»(v)  <  C'Mi||u||^»(v),  AcV. 

with  the  constant  C  independent  of  A. 


(4.27) 


Proof:  By  assumption  u  G  f')f(V).  From  Proposition  3.8,  we  hnd  that  f  is  also  in  £f{V) 
and  for  all  A, 

||PAf||£“(A)  <  ||f||£“(V)  <  C'l||u|l^»(V), 

where  Ci  is  the  norm  of  A  on  if{'V).  By  our  assumptions  on  Aa^,  we  derive  that 


This  gives 


||ua||^»(v)  =  ||ua||£™(a)  <  Mi||PAf ||^™(v)  <  C'iMi||u||^»(v). 

lkA||^™(V)  <  C'lllu  —  Ua||£™(v) 

<  C2  (^||u||^»(v)  +  ||ua||^»(v))  <  <^2(1  +  C'iMi)||u||^™(v), 


which  implies  (4.27). 


(4.28) 

□ 


There  is  a  soft  functional  analysis  argument  which  shows  that  the  boundedness  con¬ 
dition  (4.26)  is  satished  for  a  certain  range  of  r  close  to  2. 
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Theorem  4.9  Let  A  G  Bs^  for  some  Sq  >  0.  Then  there  is  a  0  <  t  <  2  and  a  constant 
C  >  0  such  that  for  all  A  dV  and  all  f  <  r  <2, 

ll"^A^Ik?’(v)^£“(v)  <  C.  (4.29) 

Proof:  First  recall  from  (2.26)  that  the  condition  numbers  of  the  matrices  Aa  statisfy 
ka  <  K  for  any  A  C  V.  Let  Ba  :=  where  =  IAaH+IA^  11 — ^  Then,  Ba  =  I  —  Ra 

where  HRaH  <  ^. 

Now  let  To  :=  (sq  +  l/2)~^.  Then,  both  I  and  A  are  bounded  on  t'^(V).  Hence,  we 
have  ||Ra||^»  (A)^£“  (A)  <  C'o  for  some  positive  constant  Cq  independent  of  A.  Using  the 
Riesz-Thorin  interpolation  theorem  for  t'2(A)  and  if  {A),  we  can  hnd  some  f  <  2  such 
that  ||Ra  hr  <ro  <1,  uniformly  in  A  and  r  <  r  <  2.  By  the  standard  Neumann  series 
argument,  we  obtain  (4.29).  □ 


Corollary  4.10  // A  G  for  some  Sq  >  0,  then  there  is  an  s  >  0  such  that  Algorithm 
I  meets  our  goal  for  all  0  <  s  <  s .  That  is,  for  each  u  G  if(V),  with  y  —  |  s  <  5, 
Algorithm  I  generates  a  sequence  of  sets  Aj,  j  =  1,  2, . . .,  such  that 

||u-ua,.||  <  C'||u||£»(v)(#Aj)“",  j  =  1,2,...  (4.30) 


with  C  a  constant. 


Proof:  From  Theorem  4.9,  there  is  a  f  <  2  such  that  (4.29)  holds  uniformly  for  all 
f  <  T  <  2  and  A  C  V.  Remark  4.8  then  shows  the  validity  of  (4.27).  We  now  apply 
Corollary  4.6  and  obtain  (4.30)  from  (4.23).  □ 


We  close  this  section  by  making  some  observations  about  the  growth  of  ||rA||^y(v) 
and  ||uA||^y(v)  for  an  arbitrary  range  of  s  which  is  only  limited  by  the  properties  of  the 
wavelet  bases.  We  shall  use  these  observations  in  the  following  section  when  we  modify 

Algorithm  I. 

Lemma  4.11  Suppose  that  u  G  ifiV)  and  r  =  {s  +  1/2)"^  with  s  >  0.  Then,  for  any 
A  C  V  one  has 

||uA||^y(v)  <  C4  (^||u||£™(v)  +  (#A)*||u  —  ua|1£2(v))  (4-31) 

with  the  constant  C4  depending  only  on  r  when  r  tends  to  0. 

Proof:  First  note  that  if  v  G  ^'2(^)5  then  v  has  at  most  ffA  nonzero  coordinates.  Using 
(3.8)  and  Holder’s  inequality  gives  for  such  v  the  inverse  estimate 


|v|£y(V)  < 


(#A)"  2  <  (^A)'’||v||£2(a)- 


(4.32) 
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Now  let  Utv  denote  the  best  iV-term  approximation  to  u  which  we  recall  is  obtained  by 
retaining  the  N  largest  coefficients.  Invoking  the  direct  estimate  from  Remark  3.2,  we 
use  (4.32)  to  conclude  that 

|ua|£™(V)  <  C  (^|ua  —  U^a|£“(V)  +  |u#a|£“(V)) 

<  C((2#A)”||  Ua  -  U^aII^jCV)  +  |u|£™(v)) 

<  C((2#A)*||  Ua  —  u||£2(v)  +  ||u||£™(v))  ,  (4.33) 

where  we  have  used  (3.11)  of  Proposition  3.2.  We  add  ||ua||^2(v)  to  both  sides  of  (4.33) 
and  observe  that 

||ua||£2(V)  <  C'||u|1^2(V)  <  C'||u||£»(v) 

to  finish  the  proof.  □ 


We  next  apply  this  lemma  to  bound  residuals. 

Lemma  4.12  Let  s  >  0,  let  A  e  Eg  and  let  the  solution  u  to  (4.1)  be  in  t')f(V).  For  any 
index  set  Ak  generated  by  Algorithm  I,  we  have 

||rAfc+ilk»(v)  <  cs  (||u||£»(v)  +  ||rAj|^»(v))  ,  k  =  l,2,...  (4.34) 

with  the  constant  cs  independent  of  k  and  u. 

Proof:  The  algorithm  determines  the  set  A^+i  from  in  the  same  way  for  each  k  = 
1,2,.. ..  Therefore,  we  can  assume  that  k  =  1.  By  (4.31)  we  have 


||UA2|k“(V)  <  C4  (||u||^»(v)  +  (#A2)*||u  -  UA2|l£2(V))  • 

We  use  (2.21)  and  Theorem  4.5  to  bound  the  second  term: 

(#A2)'||u  -  Ua2||^2(V)  <  c7^/^(#A2)iu  -  UA2II  =  <  ||f|l^»(V)  +  ||rAi|k»(V)- 

Because  of  Proposition  3.8  we  can  replace  ||f||^»(v)  by  C'||u||^»(v). 


□ 


5  A  Second  Adaptive  Algorithm 

In  this  section,  we  shall  present  a  second  adaptive  algorithm  which  will  meet  our  goal 
for  the  full  range  of  s  >  0  that  is  permitted  by  the  wavelet  basis.  We  begin  with  some 
heuristics  which  motivate  the  structure  of  the  second  algorithm. 

The  deficiency  of  Algorithm  I  of  the  last  section  is  that  it  is  only  proven  to  meet 
our  goal  for  a  small  range  of  s  >  0.  This  in  turn  is  caused  by  our  inability  to  prevent 
the  possible  growth  of  ||rAfc|l£™(v)  as  k  increases.  Since  by  assumption  u  G  f')f(V),  growth 
in  ||rAj,||^»(v)  can  only  occur  if  ||uAfc|l£™(v)  gets  large  with  k.  On  the  other  hand,  we 
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know  that  ||uAfc||£2(v)  sjce  bounded  uniformly.  Typically,  for  a  vector  v,  its  norm 

is  much  larger  than  its  ^2(V)  norm  when  v  has  many  small  entries  which  do  not  effect 
its  f'2(V)  norm  but  combine  to  have  a  serious  effect  on  the  norm.  We  can  try  to 

prevent  this  from  happening  by  thresholding  the  coefficients  in  v  and  keeping  only  the 
large  coefficients.  In  our  application  to  ua^  ,  this  is  very  hopeful  since  the  large  coefficients 
contain  the  main  source  of  the  approximation  to  u. 

Motivated  by  the  above  heuristics,  we  would  like  to  use  thresholding  in  our  second 
algorithm.  We  introduce  the  thresholding  operator  7^  which  for  77  >  0  and  a  sequence 
V  :=  is  defined  by 


(V) 


A  :  — 


v\  if  IuaI  >  rj; 
0  if  IuaI  <  V- 


We  shall  use  the  following  trivial  estimates  for  thresholding  (see  §7  of  [26]):  for  any 
V  G  we  have 

l|v-^r,v|ll(V)  =  (5-1) 

|daI<»? 


and 

#{A  :  IuaI  >v}<  C6||v||[»(v)^"^  (5.2) 

with  ce  >  1  a  constant  depending  only  on  r  as  r  — ^  0. 


Lemma  5.1  Suppose  that  v  G  0  <  r  <  2,  and  that  w  G  f'2(V)  satisfies 

l|v- w||£2(v)  <  e  (5.3) 

for  some  e  >  0.  Then,  for  any  rj  >  0,  we  have 

||v  -  7;,w||£2(v)  <  2e  +  2c6||v||^i(^)r/^~^/^  (5.4) 

and  ^ 

#{A  G  V  :  (7;w)a  ^  0}  <  ^  +  4c6||v||[„(v)7/”^.  (5.5) 

Proof:  Let  z  :=  %jW  and  consider  the  sets  Ai  :=  {A  :  IwaI  >  ^2  ■=  {A  :  IwaI  < 

Tj,  and  I^aI  >  2?7},  A3  :=  {A  :  |u!a|  <  V  I^aI  <  277}.  Then, 

l|v-z|||(v)  =  ^  ^A-^aI^T  It'A-^Al^ 

AGA1UA2  AgAs 

<  4  53  -  waI^  +  53 

AgV  \v\\<2ri 

<  4e2  +  4c2||v||[„(v)r/^"", 


where  we  used  (5.1)  and  the  fact  that  |va|  <  2|va  —  wa|  for  A  G  A2.  This  proves  (5.4). 
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For  the  proof  of  (5.5),  we  consider  the  two  sets  A4  :=  {A  :  |wa|  >  r/  and  |va|  >  '?//2} 
and  A5  :=  {A  :  |w;v|  >  V  and  |v;v|  A  v/‘^}-  Then,  from  (5.2), 


#A4  <  #{A  :  |va|  >  r//2}  <  2^C6||v||^»(v)r/  ^  <  4c6||v|lJ»(v)r/  ^ 

and 

(V2)^(#A5)  <  ^  |va  -  WAp  < 

AgAs 

which  proves  (5.5).  □ 


We  shall  nse  onr  previons  notation  which  for  an  integer  A^  >  0  and  a  vector  w  G  f'2(V) 
defines  wtv  as  the  vector  whose  N  largest  coordinates  agree  with  those  of  w  and  whose 
other  coordinates  are  zero. 


Corollary  5.2  Suppose  that  v  G  0  <  r  <  2,  and  that  w  G  f'2(V)  satisfies 

l|v- w||£2(v)  <  e  (5.6) 

for  some  e  >  0.  Let  N  :=  N{e)  be  chosen  as  the  smallest  integer  such  that 


Then, 

and 


W  -  WAr||^2(v)  <  4e 
|v  -  WAr||£2(V)  <  5e 


||v  -  WAr||^2(V)  <  C7||v||^»(V)W  *, 

with  -s  —  I  o-nd  C7  a  constant  depending  only  on  s  as  s  — ^  00. 


(5.7) 

(5.8) 

(5.9) 


Proof:  We  clearly  have  (5.8).  To  prove  (5.9),  we  shall  give  a  bonnd  for  N. 

In  the  case  where  ||w||£2(v)  A  4e,  we  trivially  have  (5.7)  with  A^  =  0  and  w^v  =  0. 
In  the  case  where  ||w||^2(v)  >  4e,  let  rj  be  the  absolnte  valne  of  the  smallest  nonzero 
coefficient  in  w^r.  For  any  p'  >  p,  we  have 


||w  -  7;,,w|1^2(v)  >  4e.  (5.10) 

On  acconnt  of  (5.4),  we  have 

||w  -  7;/w||^2  -  l|v  -  w||^2(v)  <  ||v  -  7;,'w||^2(v)  <  2e  +  2c6||v||[i(^)(r/y"^/^ 
so  that  (5.6)  and  (5.10)  ensnre  that 

e  <  2c6i|v||;i(^)(r/y-^^^  (5.11) 

holds  for  all  p'  >  p.  Therefore, 

e  <  2c6||v||;i(^)r/^-"/2  =  2ce\\vf^i‘ly-^p^T  (5.12) 
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On  the  other  hand,  from  (5.5)  we  find 


4e2 

N  <  #{A  e  V  :  (T,w)j  ^  0}  <  .^  +  4c6||v||J.,y|i( 


(5.13) 


We  use  (5.12)  to  estimate  each  of  the  two  terms  on  the  right  of  (5.13).  For  example,  for 
the  second  term,  we  have 

4c6||v||[»(v)r/~^  <  2(2c6)^+^/"||v|l[l^(Jf  =  2(2c6)^+^/"||v|lJ^"^)e-^/h  (5.14) 

A  similar  estimate  shows  that  the  first  term  on  the  right  of  (5.13)  does  not  exceed 
4(2c6)^^^'^'^||v||]^^(y^e“^/^.  In  other  words. 


N  <  6(2c6)‘+‘''*||v||ji‘,vf  1  =  (c7/5)'''*||v||J^‘v)E 


(5.15) 


where  the  last  equality  serves  to  define  C7.  When  this  estimate  for  N  is  used  in  (5.8),  we 
arrive  at  (5.9).  □ 


Algorithm  II  will  modify  Algorithm  I  by  the  introduction  of  the  following  step: 


COARSE:  Given  a  set  A  and  a  Galerkin  solution  ua  associated  to  this  set,  take  e  :  = 
||rA||^2(v)  o-nd  apply  Gorollary  5.2  with  v  :=  u  and  w  :=  ua  to  produce  the  vector 
Then,  COARSE  produces  the  set  A  of  indices  for  the  nonzero  coordinates  ofw^  and 
then  applies  GALERKIN  to  this  new  set  to  obtain 

Remark  5.3  If  A  is  any  set,  it  follows  from  Gorollary  5.2  that  the  input  of  A  into 
COARSE  yields  a  set  A  with  a  Galerkin  solution  which  satisfies 

||u  -  u;^||  <  C8||u||^»(v)(#A)“*,  (5.16) 

where  Cg  =  with  C2  from  (2.21)  and  C7  from  (5.9). 

Proof:  We  have 

||u  -  u^ll  <  ||u  -  Wai||  <  C2'^^||u  -  Wai||£2(V),  (5.17) 

because  the  Galerkin  projection  gives  the  best  approximation  ua  to  u  from  f'2(A)  in  the 
energy  norm.  We  bound  the  right  side  of  (5.17)  by  (5.9).  □ 

Remark  5.4  It  also  follows  from  Gorollary  5.2  together  with  Lemma  f.ll  that  the  input 
of  A  into  COARSE  yields  a  set  A  with  a  Galerkin  solution  such  that 

||ua||£™(v)  <  C9||u||^»(v)  (5.18) 

with  the  constant  cg  depending  only  on  r  as  r  ^  0.  Of  course,  this  also  implies  that 
IkAlkrlv)  ^  l|u||^?’(v)-  Thus,  the  thresholding  step  allows  the  control  of  the  if  {'V)  norm 
of  the  residual. 
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We  can  now  describe  Algorithm  II. 


Algorithm  II: 

•  Let  Ao  =  0  and  rA^  =  f. 

•  For  j  =  0,1,2,...,  determine  Aj+i  from  Kj  as  follows.  Let  :=  Kj.  For  k  = 

1,2,...  determine  Aj^k  from  Aj^k-i  by  applying  GALERKIN  and  then  GROW 
to  Aj^k-i-  Apply  COARSE  to  Aj^k  to  determine  Aj^k  and  If  IkA^  j,lk2(v)  < 

|||rA.|l£2(v)  then  define  A^+i  :=  Aj^ki  kj  :=  k,  and  stop  the  iteration  on  k.  Otherwise 
advance  k  and  continne. 

Theorem  5.5  // A  G  Bs,  for  some  s  >  0,  then  Algorithm  II  satisfies  our  goal  for  this 
s.  Namely,  if  u  E  then  the  algorithm  produces  sets  Aj,  j  =  1,2, . . such  that 

||u  -  uaJI  <  C8(#Aj)“^||u||£»(v)  (5.19) 

with  cs  the  constant  of  Remark  5.3.  In  addition,  for  j  =  1,2, . . .,  we  have 

i|u  -  ua,.  Il£2(v)  <  cb^C2||u|1^2(v)2”^  (5.20) 

with  Cl  and  C2  the  constants  of  (2.21). 

Proof:  Since  the  set  Aj  is  the  ontpnt  of  COARSE,  the  estimate  (5.19)  follows  from 
Remark  5.3.  By  the  definition  Aj+i  :=  Aj^kj  in  Algorithm  II,  we  have 

IkAj+ilkaCV)  =  lkA,-fcJk2(V)  <  2  II’^aJ^2(V)-  (5-21) 

Iterating  this  ineqnality,  we  obtain 

llrAjIkaCV)  <  2~^||rAoll£2(v)  =  2”^||f||£2(v)  <  C22”-^||u||£2(v)- 

Since,  ||u  -  uaJ|^2(v)  <  Ik2(v),  we  arrive  at  (5.20).  □ 

The  following  remark  will  be  important  in  the  following  section  on  nnmerical  compn- 
tation.  It  shows  that  the  intermediate  steps  between  Aj  and  A^+i  do  not  generate  sets 
Aj^k  which  might  be  very  large  in  comparison  to  Aj  and  Aj+i. 

Remark  5.6  Under  the  assumptions  of  Theorem  5.5  we  have 

kj<K,  j  =  l,2,...,  (5.22) 

with  K  the  smallest  integer  such  that  bcf'^c^O^  <  1/2  where  ci,  c^  are  the  constants  of 
(2.21)  and  9  is  given  by  (4.13). 
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Proof:  This  follows  from  the  following  string  of  inequalities,  where  we  denote  by  the 
intermediate  output  of  COARSE  obtained  by  thresholding  ua^.  j.  before  computing  the 
new  Galerkin  solution: 


J|£2(V)  <  C2||u-U^._J|£2(V) 

<  C2C^^/^||u-U^,  J| 

<  C2C^^'^^||u  —  W^ll 

<  C2C^^/^C2^^||u- W^||£2(V) 

<  5cic7^/^||u-UA^.J| 

<  hclci^^'^O^Wvi  -  uaJ 

<  5cr^c|d^||rAj^2(v)- 

The  hfth  inequality  follows  from  (5.8)  and  the  fact  that  e  =  ch^||rA' fc||^2(v)  in  fh®  applica¬ 
tion  of  COARSE  to  Aj  fc.  All  other  inequalities  use  norm  equivalences  of  the  type  (2.21). 
From  this  estimate  we  see  that  the  criterion  ||rA  fcil^2(v)  <  ilkA  ||£2(v)  is  met  whenever 
k>  K.  j.  j  ^ 

Note  that  this  remark,  combined  with  Lemma  4.12,  has  the  following  consequence. 

Remark  5.7  The  residuals  in  the  intermediate  steps  are  also  uniformly  hounded  in 
£f(V)  and  that  the  cardinalities  #Afcj  can  always  be  controlled  by  #Aj.  The  intermediate 
steps  remain  within  our  goal  of  optimal  accuracy  with  respect  to  the  number  of  parameters. 

Theorem  5.5  shows  that  Algorithm  II  is  optimal  for  the  full  range  of  s  permitted  by 
the  wavelet  bases.  By  the  same  considerations  as  in  Remark  4.7,  this  algorithm  is  also 
optimal  in  the  sense  of  achieving  a  prescribed  tolerance  e. 

6  Numerical  Realization:  Basic  Ingredients 

The  previous  sections  have  introduced  and  analyzed  the  performance  of  two  adaptive 
methods  for  resolving  elliptic  equations.  The  analysis  however  was  more  from  a  theoretical 
perspective  and  did  not  incorporate  computational  issues.  Our  purpose  is  now  to  address 
these  computational  issues.  More  precisely,  we  want  to  develop  a  numerically  realizable 
version  of  Algorithm  II  and  to  analyze  its  complexity.  In  the  present  section,  we  shall 
introduce  the  basic  subroutines  that  constitute  the  resulting  Algorithm  III  which  will 
be  described  and  analyzed  in  the  hnal  section. 

Let  us  hrst  explain  the  basic  principle  of  Algorithm  III.  This  algorithm  will  itera¬ 
tively  generate  a  sequence  of  index  sets  Aj  and  approximate  solutions  ua^  supported  in 
Aj  (uAj.  differs  in  general  from  the  Galerkin  solution  ua^),  with  the  property  that 

i|u  -  ua,.  ||£2(v)  <  G  :=  2"-^eo,  (6.1) 
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where  eo  is  an  estimate  from  above  of  ||u||^2(v)  (which  will  allow  us  to  take  as  an  admissible 
starting  point  uaq  =  0  and  Aq  empty).  This  progression  toward  finer  accuracy  will 
be  performed  by  the  main  subtroutine  PROG  which  will  be  assembled  in  §7  from  the 
ingredients  that  we  shall  introduce  in  the  present  section. 

If  we  are  given  a  tolerance  e  that  gives  the  target  accuracy  with  which  we  wish  to 
resolve  the  solution  to  (2.17),  we  shall  thus  need  J  applications  of  PROG  where  J  is  the 
smallest  integer  such  that  ej  <  e. 

We  then  ask  what  the  total  computational  cost  will  be  to  attain  this  accuracy.  There 
will  be  two  sources  of  computational  expense:  arithmetic  operations  and  sorting.  Arith¬ 
metic  operations  include  additions,  multiplications,  and  square  roots.  We  shall  ignore 
error  due  to  roundoff.  We  shall  estimate  the  number  of  arithmetic  computations  A^(e) 
and  the  number  of  sorts  N (e)  needed  to  achieve  this  accuracy.  We  shall  see  that  N (e)  can 
be  related  to  e  in  the  same  way  that  the  error  analysis  of  the  preceding  section  related 
error  to  the  size  of  the  sets  Aj.  The  sorting  will  introduce  an  additional  logarithmic  factor. 

Our  subroutines  will  be  described  so  as  to  apply  to  any  vectors  and  thereby  Algorithm 
III  will  allow  us  to  solve  (2.17)  for  any  right  hand  side  f.  However,  we  shall  analyze  its 
performance  only  when  u  is  in  ^(^(V),  t  :=  {s  + 1/2)~^,  for  some  s  >  0  in  the  same  range 
of  optimality  as  for  Algorithm  II.  Note  that  this  range  is  limited  only  by  s*  in  (3.16), 
i.e.  the  compressibility  order  of  the  operator  in  the  wavelet  bases. 

Our  analysis  will  show  that  if  u  has  the  smoothness,  then  the  computational  cost 
and  memory  size  needed  to  achieve  accuracy  ej  is  controlled  by  so  that  the 

last  step  J  —  1  I— J  dominates  the  overall  computational  cost.  This  should  be  compared 
to  the  optimality  analysis  of  the  full  multigrid  algorithm,  for  which  the  complexity  is  also 
dominated  by  the  last  step  of  the  nested  iteration.  However,  in  the  multigrid  algorithm, 
each  step  of  the  nested  iteration  is  associated  to  a  uniform  discretization  at  a  scale  j,  which 
corresponds  to  imposing  that  Aj  is  the  set  of  all  indices  |A|  <  j,  rather  than  an  adaptive 
set.  In  this  case,  the  new  layer  Aj+i/Aj  updating  the  computation  thus  corresponds  to  a 
scale  level,  while  in  our  adaptive  algorithm  it  is  rather  associated  to  a  certain  size  level  of 
the  wavelet  coefficients  of  u.  Accordingly  the  classical  Sobolev  smoothness  which  enters 
the  analysis  of  multigrid  algorithms  is  replaced  by  the  weaker  Besov  smoothness  expressed 
by  the  t')!'  property. 

Algorithm  III  will  involve  numerical  versions  of  procedures  like  GROW,  COARSE 
or  GALERKIN.  In  these  subroutines  exact  calculations  will  have  to  be  replaced  by  ap¬ 
proximate  counterparts  whose  accuracy  is  controlled  by  corresponding  parameters.  Thus 
the  input  will  consist  of  the  objects  like  index  sets  or  vectors  to  be  processed  as  well  as 
control  and  threshold  parameters.  To  keep  track  of  these  parameters  and  their  interde¬ 
pendencies  we  will  consistently  use  the  following  format  for  such  subroutines 
NAME  [/Pi, . . . ,  IPe]  — ^  (GPi, . . . ,  OPr),  meaning  that  given  the  input  /Pi, . . . ,  I Pi 
the  procedure  NAME  generates  output  quantities  OPi, . . . ,  OPr. 

Some  of  the  subroutines  will  make  use  of  estimates  of  several  constants  like  Ci,  C2  from 
previous  sections,  in  which  case  we  shall  specify  them  and  explain  how  such  estimates  can 
be  obtained.  All  other  constants  entering  the  analysis  of  the  algorithm  but  not  its  concrete 
implementation  will  be  denoted  by  C  without  further  distinction.  Their  specific  value  only 
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affects  the  constant  in  onr  asymptotic  estimates.  In  particnlar,  they  are  independent  of 
the  data  f  the  solntion  u  or  its  varions  approximations.  If  necessary  their  dependence  on 
the  parameters  r  or  s  will  be  explained. 


6.1  The  Assembly  of  / 

We  shall  take  the  viewpoint  that  we  have  complete  knowledge  abont  the  data  /  in  the 
sense  that  we  already  know  or  can  compnte  its  wavelet  coefficient  to  any  desired  accnracy 
by  an  appropriate  qnadratnre.  This  in  tnrn  enables  ns  to  approximate  f  to  any  accnracy 
by  a  finite  wavelet  expansion.  We  formnlate  this  as 

Assumption  Nl:  We  assume  that  for  any  given  tolerance  rj  >  0,  we  are  provided  with 
the  set  A  :=  A(/,  r/)  of  minimal  size  such  that  f  =  PAf  satisfies 

l|f-f||fe(v)<i(.  (6.2) 


For  the  pnrpose  of  onr  asymptotic  analysis,  we  conld  actnally  replace  “minimal”  by 
“nearly  minimal”,  in  the  sense  that  the  following  property  holds:  if  f  is  in  t'^(V)  for  some 
r  <  2,  then  we  have  the  estimate 

#(A)  <  (6-3) 

with  s  =  l/2  —  1/r  and  C  a  constant  that  depends  only  on  s  as  s  tends  to  +oo.  This 
modified  assnmption  is  mnch  more  realistic,  since  in  practice  one  can  only  have  approxi¬ 
mate  knowledge  of  the  index  set  corresponding  to  the  largest  coefficients  in  f ,  nsing  some 
a-priori  information  on  the  smooth  and  singnlar  parts  of  the  fnnction  /.  However,  in 
order  to  simplify  the  notation  and  analysis  in  what  follows  we  shall  assnme  that  the  set 
A  is  minimal. 

In  the  implementation  of  Algorithm  III,  the  above  tolerance  rj  will  typically  be 
related  to  the  target  accnracy  e  of  the  solntion  by  a  fixed  mnltiplicative  constant.  We 
perform  the  following  two  preprocessing  steps  on  f; 

(i)  sort  the  entries  in  f  to  determine  the  vector  A*  =  (Ai,  A2, . . . ,  Ajy)  of  indices  which 

gives  the  decreasing  rearrangement  f*  =  (I/aJ,  I/A2I)  •  •  •  >  I/ajvD-  The  cost  of  this 
operation  is  in  0{N  log  N)  operations. 

(ii)  compnte  :=  ||f  ll^2(v)  +  ^  Xlili  lAP  +  The  cost  of  this  is  2N  —  1  arithmetic 

operations. 

The  second  step  gives  ns  the  estimate  ||f||^2(v)  <  F.  We  store  F  and  the  vector  A*. 

6.2  A  Numerical  Version  of  COARSE 

Algorithm  III  will  also  make  nse  of  less  accnrate  approximations  of  f  in  its  intermediate 
steps.  This  is  one  instance  of  the  freqnent  need  to  provide  a  good  coarser  approximation 
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to  a  finitely  supported  vector.  Such  approximations  will  be  generated  by  the  routine 
NCOARSE  that  we  shall  now  describe. 

NCOARSE  [w,r/]  ^  (A,w); 

(i)  Define  N  :=  7^(suppw)  and  sort  the  nonzero  entries  of  w  into  decreasing  order. 

Thereby  one  obtains  the  vector  \*  :=  A*(w)  =  (Ai,  A2,  •  •  • ,  Aat)  of  indices  which 
gives  the  decreasing  rearrangement 'w*  =  (|waJ,  IwajI,  • ' '  >  Iwaj^I)  of  the  nonzero 
entries  o/w;  then  compute  |wAip. 

(ii)  For  /c  =  1,  2,  •  •  •,  form  the  sum  I^Aj  P  in  order  to  find  the  smallest  value  k  such 

that  this  sum  exceeds  For  this  k,  define  K  :=  k  and  set  A  :=  {Aj  ;  j  = 

1,  •  •  • ,  A};  define  w  by  w\  :=  w\  for  A  G  A  and  wx  :=  0  for  A  ^  A. 

We  first  describe  the  computational  cost  of  NCOARSE. 

Properties  6.1  For  any  w,  and  p,  we  need  at  most  2N  arithmetic  operations  and 
NlogN  sorts,  N  :=  fi={suppw),  to  compute  the  output  w  0/ NCOARSE  which,  by 
construction,  satisfies 

l|w  -  w||£2(v)  <  ^7-  (6.4) 

We  shall  also  apply  NCOARSE  to  the  initial  approximation  f  of  the  data  in  order  to 
produce  other  near  optimal  A-term  approximations  of  f  with  fewer  parameters.  Thanks 
to  the  preprocessing  steps,  in  this  case  we  can  save  on  the  computational  cost  of  this 
procedure.  An  immediate  consequence  of  (5.15)  and  Properties  6.1  is  the  following. 

Properties  6.2  Assume  that  f  is  an  optimal  N -term  approximation  of  the  data  f  with 
accuracy  rj,  as  described  by  (6.2).  Then,  for  p  >  p,  NCOARSE  [f,  f)  —  p]  produces  an 
approximation  g  to  f  with  support  K,  such  that  ||g  — f||£2(v)  <  f/-  In  addition,  iff  G  f'^(V), 
we  have 

#(A)  <  (6-5) 

with  C  depending  only  on  s. 

Moreover,  determining  g  reguires  at  most  arithmetic  operations  and  no  sorts 

since  sorting  of  f  was  done  in  the  preprocessing  stage. 

To  simplify  notation  we  will  denote  throughout  the  remainder  of  the  paper  by 
NCOARSE  [f,  i)]  the  output  of  NCOARSE  [f,f)  —  77],  since  it  has  the  same  optimal 
approximation  properties  as  thresholding  the  exact  data. 

We  now  turn  to  the  primary  purpose  of  the  coarsening  procedure.  Recall  that  the 
role  of  COARSE  in  Algorithm  II  above  is  the  following.  If  v  is  a  given  vector  from 
if(V)  and  w  is  a  good  (finitely  supported)  approximation  to  v  in  the  f'2(V)-norm  but 
has  large  f'^(V)  norm  then  COARSE  uses  thresholding  to  produce  a  new  approximation 
with  slightly  worse  f'2(V)-approximation  properties  but  guaranteed  good  if(V)  norms. 
The  following  algorithm  gives  the  numerical  form  of  COARSE  that  we  shall  use.  The 
following  additional  properties  of  NCOARSE  follow  from  the  results  in  §  5. 
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Properties  6.3  Given  a  vector  w,  a  tolerance  0  <  rj  <  ||v||£2(v);  o  finitely  supported 
approximation  w  to  v  that  satisfies 

||v- w||^2(v)  <  r//5,  (6.6) 

the  algorithm  NCOARSE  [w,4r//5]  produces  a  new  approximation  w  to  v,  supported  on 
A,  which  satisfies 

||v-w||£2(v)  <^-  (6.7) 

Moreover,  the  following  properties  hold: 

(i)  If  V  E  r  =  (s  +  1/2)“^,  for  some  s  >  0,  then  the  outputs  w  and  A  of 

NCOARSE  satisfy 

l|v  -  w||£2(v)  <  C'||v||£»(v)#(A)“A  (6.8) 

(ii)  Ifw^  Gfiy),  T  =  {s  +  1/2)“^,  for  some  s  >  0,  then  the  output  w  o/ NCOARSE 
satisfies 

||w||^»(v)  <  C'||v||^»(v),  (6.9) 

where  C  depends  only  on  s  as  s  ^  oo. 

(iii)  The  cardinality  of  the  support  A  of  w  is  bounded  by 

#(A)  <  (6.10) 

Proof:  The  estimate  (6.7)  is  an  immediate  consequence  of  the  steps  in  NCOARSE.  (i) 
follows  from  Corollary  5.2  (see  (5.9)).  (ii)  is  proved  in  a  similar  fashion  to  Lemma  4.11. 
Let  K  :=  ff{A)  and  let  \k  be  the  best  approximation  to  v  from  Then,  as  in  (4.33), 
we  derive 

|w|^™(V)  <  C  (^|w  —  Vi^|£™(V)  +  |Vi^|£™(V)) 

<  C  (^(2A)^||w  —  Vi^||£2(v)  +  |v|£™(v))  (6-11) 

<  w||£2(v)  +  ||v|l£»(v)) , 

where  we  used  (4.32).  We  insert  (6.8)  into  (6.11)  and  add  ||w||£2(v)  to  both  sides  and 
arrive  at  (6.9).  The  estimate  (iii)  is  an  immediate  consequence  of  (5.15).  □ 

6.3  The  Assembly  of  A 

We  shall  need  to  compute  a  certain  finite  number  of  entries  of  A.  The  entries  that  need 
to  be  computed  will  be  prescribed  as  the  adaptive  algorithm  proceeds  and  are  not  known 
in  advance.  They  are  associated  to  the  application  of  one  of  the  compressed  matrices  A^ 
to  a  finite  vector  v,  as  will  be  discussed  below.  Therefore,  the  entries  are  computed  as 
their  need  arises.  When  we  compute  an  entry  of  A  we  store  it  for  future  possible  use.  We 
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shall  make  the  following 


Assumption  N2:  Any  entry  of  A  can  be  computed  (up  to  roundoff  error)  at  unit 
cost. 

In  some  cases,  this  assumption  is  completely  justified.  For  example,  if  the  operator 
A  is  a  constant  coefficient  differential  operator  and  the  domain  is  a  union  of  cubes,  then 
suitable  biorthogonal  wavelet  bases  are  known  where  the  primal  multiresolution  spaces 
are  generated  by  5— splines.  In  this  case,  the  functions  which  appear  in  the  integrals 
defining  the  entries  of  A  are  piecewise  polynomials.  Therefore,  they  can  be  computed  ex¬ 
actly.  When  A  is  a  differential  operator  with  varying  coeffcicients  or  when  A  is  a  singular 
integral  operator  the  entries  of  A  have  to  be  approximated  with  an  accuracy  depending 
on  the  desired  final  accuracy  e  of  the  solution.  It  is  then  by  far  less  obvious  how  to  realize 
Assumption  N2  and  a  detailed  discussion  of  this  issue  (which  very  much  depends  on 
the  individual  concrete  properties  of  A)  is  beyond  the  scope  of  this  paper.  We  therefore 
content  ourselves  with  the  following  indications  that  (N2)  is  not  quite  unreasonable.  A 
central  issue  in  [23,  35,  36]  is  to  design  suitably  adapted  quadrature  schemes  for  com¬ 
puting  the  significant  entries  of  wavelet  representation  of  the  underlying  singular  integral 
operator  in  the  following  sense.  The  trial  spaces  under  consideration  are  spanned  by  all 
wavelets  up  to  a  highest  level  J,  say.  Then,  it  is  shown  how  to  compute  a  compressed 
matrix  having  only  the  order  of  Nj  =  nonzero  entries  (up  to  possible  log  factors  in 
some  studies)  at  a  computational  expense  which  also  stays  proportional  to  Nj  (again 
possibly  times  a  log  factor).  Since  the  compression  in  these  papers  is  slightly  different 
from  the  one  used  here  and  since  only  fully  refined  spaces  have  been  investigated  these 
results  do  not  apply  here  directly.  Nevertheless,  they  indicate  that  the  development  of 
schemes  that  keep  the  computational  work  per  entry  low  is  not  completely  unrealistic. 

In  the  development  of  the  numerical  algorithm,  we  shall  need  constants  ci  and  C2  such 
that  (2.21)  holds.  In  practice,  it  is  not  difficult  to  obtain  sharp  estimates  of  the  optimal 
constants  since  as  J  grows,  they  are  well  approximated  by  the  smallest  and  largest  eigen¬ 
values  of  the  preconditioned  matrix  Avj  corresponding  to  the  setVj  =  {AGV;  |A|<J} 
associated  to  the  uniform  discretization  through  the  trial  space  Fvj.  For  simplicity  we 
will  take  n  :=  C2/C1  as  an  estimate  for  the  condition  number  of  A,  see  (2.24). 

We  next  discuss  the  quasi-sparsity  assumptions  that  we  shall  make  on  the  matrix  A. 

Assumption  N2:  We  assume  that  the  matrix  A  is  quasi- sparse  in  the  sense  that  it 
is  known  to  he  in  the  class  Bg  of  ^3.3  for  s  <  s*  for  some  s*  >  0.  We  recall  that  A  E  Bg 
implies  that  for  each  /c  =  1,  2, . . .,  there  is  a  matrix  A^,  with  at  most  2^ak  entries  in  each 
row  and  column,  that  satisfies 

||A- Afcll  <C2-’^^ak,  (6.12) 

with  a  summable  sequence  of  positive  numbers.  We  will  assume  that  the  positions 

of  the  entries  in  Ak  are  known  to  us. 
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We  have  discussed  previously  how  this  assumption  follows  from  the  original  elliptic 
equations  and  the  wavelet  basis.  In  particular,  they  are  implied  by  decay  properties  of 
the  type  (2.30).  The  compression  rules  leading  to  the  matrices  and,  in  particular,  the 
positions  of  the  significant  entries  are  in  this  case  explicitly  given  in  the  proof  of  Propo¬ 
sition  3.4  and  depend  only  on  k. 

In  the  development  of  the  numerical  algorithm,  we  shall  make  use  of  the  estimates 
(6.12)  in  the  form 

||A-Afc||<afc,  (6.13) 

where  the  constants  are  upper  bounds  for  the  compression  error  ||A  —  Afc||.  The 
might  simply  correspond  to  a  rough  estimate  of  C  in  (6.12)  or  result  from  a  more  precise 
estimate  of  ||  A  —  A^H  that  can  in  practice  be  obtained  by  means  of  the  Schur  lemma. 

The  entries  we  compute  in  A^  are  determined  by  the  vectors  to  which  A*,  is  applied. 
We  only  apply  A^  to  vectors  v  with  finite  support. To  compute  A^v  requires  only  that  we 
know  the  nonzero  entries  of  A^  in  the  columns  corresponding  to  the  nonzero  entries  of  v. 
Hence,  at  most  afc2^(7)^supp  v)  entries  will  need  to  be  computed.  We  shall  keep  track  of 
the  number  of  these  computations  in  the  analysis  that  follows. 

6.4  Matrix/ Vector  Multiplication 

It  is  clear  from  Algorithm  II  that  the  main  numerical  tasks  are  the  computation  of 
Galerkin  solutions  and  the  evaluation  of  residuals.  Both  rest  on  the  repeated  application  of 
the  quasi-sparse  matrix  A  to  a  vector  v  with  finite  support.  Since  the  matrices  and  vectors 
are  in  general  only  quasi-sparse  this  operation  can  be  carried  out  only  approximately  in 
order  to  retain  efficiency.  For  this,  we  shall  use  the  algorithm  of  §3.3  applied  to  B  =  A. 
We  recall  our  convention  concerning  the  application  of  an  infinite  matrix  to  a  finite  vector 
:  we  consider  the  vector  to  be  extended  to  the  infinite  vector  on  A  obtained  by  setting 
all  new  entries  to  be  zero.  The  extended  vector  will  also  be  denoted  by  v. 

Given  a  vector  v  of  finite  support  and  N  =  T^^^suppv,  we  sort  the  entries  of  v  and 
form  the  vectors  V[o],  V[j]  —  V[j_i],  j  =  1,  •  •  •,  [log  AJ.  For  j  >  log  A,  we  define  V[j]  :=  v. 
Recall  from  §  3  that  V[j]  agrees  with  v  in  its  2^  largest  entries  and  is  zero  otherwise.  This 
process  requires  at  most  A  log  A  sorts. 

We  shall  numerically  approximate  Av  by  using  the  vector 

Wfc  :=  AfcV[o]  -1-  Afc_i(v[i]  -  V[o])  H - h  Ao(v[fc]  -  V[fc_i])  (6.14) 

for  a  certain  value  of  k  determined  by  the  desired  numerical  accuracy.  As  noted  ealier,  this 
vector  can  be  computed  by  using  <  Gi2^  operations  and  requires  the  computation  of  at 
most  this  same  number  of  entries  in  A,  recall  Gorollary  3.10.  Note  that  if  2^  >  7)^:(supp  v), 
then  some  of  the  terms  in  (6.14)  will  be  zero  and  therefore  need  not  be  computed. 
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By  increasing  k,  we  increase  the  accuracy  of  the  approximation  to  Av.  In  partic¬ 
ular,  as  derived  in  §3.3,  see  (3.25),  we  have  the  error  estimate 

k-l 

||Av  —  Wfc||£2(v)  <  C2||v  —  V[fc]||£2(v)  +  Ofc||v[o]||£2(v)  +  X]  ~  ^[k-j-i]\\e2{^)^  (6.15) 

j=0 

where  aj  is  the  compression  bound  from  (6.13).  Note  that  ||v  —  vp]||£2(v)  =  l|v||^2(v)  ~ 

l|v[i]lli(v)  and  ||v[j]|||(v)  =  ELi  \\^[i]  “  V[z-i]|||(v)  Hence,  the  right  hand  side  of  (6.15) 
can  be  computed  for  any  k  with  at  most  C'(7)^(supp  v))  operations. 

With  these  remarks  in  hand,  we  introduce  the  following  numerical  procedure  for  ap¬ 
proximating  Av. 

APPLY  A[r/,v]  ^  (w,A): 

(i)  Sort  the  nonzero  entries  of  the  vector  v  and  form  the  vectors  vjoj,  vpj  —  j  = 

1,  •  •  • ,  [log  AJ  with  N  :=  ffsuppw.  Define  V[j]  :=  v  for  j  >  log  A. 

(ii)  Compute  ||v|||(v),  ||v[o]|l?2(v)^  h[j]  “  V[i-i]lll(v)^  J  =  1,  •  •  • ,  [log  Aj  +  1. 

(iii)  Set  k  =  0. 

(a)  Compute  the  right  hand  side  Rk  of  (6.15)  for  the  given  value  of  k. 

(b)  If  Rk  <  V  stop  and  output  k;  otherwise  replace  k  by  k  +  1  and  return  to  (a). 

(iv)  For  the  output  k  of  (iii)  and  for  j  =  0, 1,  •  •  • ,  /c,  compute  the  nonzero  entries  in 
the  matrices  A^k-j  which  have  a  column  index  in  common  with  one  of  the  nonzero 
entries  o/vpj  —  vp_i]. 

(v)  For  the  output  k  of  (iii),  compute  as  in  (6.14)  and  take  w(v,r])  :=  and 

A  =  supp  w . 

Properties  6.4  Civen  a  tolerance  rj  >  0  and  a  vector  with  finite  support,  the  algorithm 
APPLY  A  produces  a  vector  w(v,r])  which  satisfies 

||Av  -  w||£2(v)  <  h-  (6-16) 

Moreover,  if 'v  E  ifCV),  with  r  =  (s  +  1/2)”^/^  and  0  <  s  <  s* ,  then  the  following 
properties  hold: 

(i)  The  size  of  the  output  A  is  bounded  by 

#(A)  <  (6-17) 

and  the  number  of  entries  of  A  that  need  to  be  computed  is  < 

(ii)  The  number  o/ arithmetic  operations  needed  to  compute  w(v,  r/)  does  not  exceed 

-(-  2 A  with  A  :=  T^^^supp  v. 
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(m)  The  number  0/ sorts  needed  to  assemble  the  V[j],  j  =  0, 1,  •  •  • ,  [logiVj,  o/w(v,  r/) 
does  not  exceed  CNlogN. 

{iv)  The  output  vector  w  satisfies 


||w||£™(v)  <  C'||v||^»(v).  (6.18) 

Proof:  The  estimate  (6.16)  follows  from  the  preceding  remarks  centering  upon  (6.15). 
Properties  (i)-(iii)  follow  from  the  results  of  §  3.3  (see  Corollary  3.10).  Property  (iv)  is 
proved  in  the  same  way  that  we  have  proved  Proposition  3.8.  Namely,  for  j  =  0,1,  ■  ■  ■ ,  k, 
we  prove  that  ||wfc  —  Wj|1^2(v)  <  C'2“-^^||v||£™(v)  as  in  (3.25).  This  then  proves  (6.18) 
because  of  Proposition  3.2.  □ 


6.5  The  Numerical  Computation  of  Residuals 

Recall  that  Algorithm  II  heavily  utilizes  knowledge  of  residuals.  We  suppose  that  A  is 
any  given  finite  subset  of  V,  and  we  denote  as  usual  by  ua  the  Galerkin  solution  asso¬ 
ciated  to  the  set  A.  Since,  we  cannot  compute  ua  nor  its  residual  Aua  —  f  exactly,  we 
shall  introduce  numerical  algorithms  which  begins  with  an  approximation  v  to  ua  and 
approximately  computes  the  residual  Av  —  f.  For  this  computation,  we  introduce  the 
following  procedure,  which  involves  two  tolerance  parameters  771,772  reflecting  the  desired 
accuracy  of  the  computation  of  Av  and  of  f,  rspectively. 

NRESIDUAL[v,  A,  1,771,772]  ^  (r,A): 

(i)  APPLYA[v,77i]  ^  (w,Ai). 

(ii)  NCOARSE[f,  772]^(g,A2). 

(iii)  Set  r  :=  w  —  g  and  A  :=  supp  r  C  Ai  U  A2. 

Note  that,  due  to  the  various  approximations,  the  output  r  is  not  necessarily  supported 
in  V  \  A,  in  contrast  to  the  exact  residual  rA  =  f  —  Aua. 

Properties  6.5  The  output  r  o/NRESIDUAL  satisfies 

Ik  -  rAllfeW  <Vi+V2  +  C2IIV  -  ua||72(v)-  (6.19) 

Furthermore,  ifxi  E  (V),  with  r  =  {s  +  1/2)“^A  0  <  s  <  s*  (which  in  particular 

implies  f  G  t')f(V),  see  Proposition  3.8),  then  the  following  holds: 

(i)  The  support  size  of  the  output  is  bounded  by 

#(A)  <  #(Ai)  +  #(A2)  <  C'(77i"^/"||v||Jk(v)  +  %  (6.20) 
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(ii)  The  number  of  arithmetic  operations  used  in  NRESIDUAL  does  not  exceed 

C  +  2N  with  iV  :=  #(A) 

(iii)  The  number  0/ sorts  needed  in  the  computation  of  r  does  not  exceed  CNlogN. 

(iv)  The  output  r  satisfies 

||r||£™(v)  <  C'(||u|l£™(v), +||v||£™(v)).  (6-21) 

Proof:  The  estimate  (6.19)  follows  from 

||r  -  rA||^2(v)  <  ||f  -  g||^2(v)  +  ||Av  -  wH^^lv)  +  ||A(v  -  ua)||£2(v), 

and  (2.22).  All  other  properties  are  direct  consequences  of  the  Properties  6.2  and  6.4  of 

NCOARSE  and  APPLY  A.  □ 


6.6  A  Sparse  Galerkin  Solver 

This  subsection  will  be  concerned  with  the  computation  of  a  numerical  approximation 
ua  of  ua  for  any  given  set  A  C  V.  We  shall  discuss  this  issue  in  the  context  of  gradient 
methods.  A  similar  discussion  applies  to  conjugate  gradient  methods.  Given  a  set  A,  we 
thus  wish  to  solve 

PaAua  =  PAf.  (6.22) 

Suppose  that  we  are  provided  with  a  current  known  approximation  v  to  ua  with  v  sup¬ 
ported  on  A,  and  that  we  want  to  produce  an  approximation  ua,  supported  on  A,  such 
that  II Ua  —  ua||^2  —  V  some  prescribed  tolerance  rj. 

The  gradient  method  (or  damped  Richardson  iteration)  takes  as  the  next  approxima¬ 
tion 

v'  :=  V  -  ttA(AAV  -  PAf)  (6.23) 

where  q;a  is  to  be  chosen.  Then,  v'  is  also  supported  on  A  and  using  (6.22),  we  have 

I|ua  -  v'||^2(v)  <  ^'aIIua  -  v||^2(v)-  (6.24) 

where 

dA:=  ||Pa(I-«aA)||  (6.25) 

with  I  the  identity  matrix. 

To  turn  this  into  a  numerical  algorithm,  we  need  to  provide:  (i)  a  value  for  ua,  (h)  an 
approximation  for  Aav  —  PAf-  We  shall  take 

1 

UA  :=  :=  — ,  (6.26) 

C2 
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where  C2  is  our  bound  for  ||A||  given  in  (2.22).  With  this  choice,  it  follows  that 

0A  <  1  -  (6.27) 

with  K  =  C2I Cl  the  estimated  condition  number. 

We  next  discuss  the  computation  of  Aav  —  PAf  which  we  call  the  “internal  residual” . 
In  contrast  to  the  full  residual  Av  —  f  of  the  full  equation,  the  internal  residualcan  be 
computed  exactly  at  finite  cost.  However,  this  cost  remains  too  large  for  the  purpose 
of  obtaining  a  computationally  optimal  algorithm,  so  that  in  practice,  we  shall  need 
to  replace  the  internal  residual  by  a  numerical  approximation  r.  We  next  examine  the 
properties  we  shall  want  for  the  numerical  approximation  r  in  order  that  that  the  modified 
iterations  still  converge.  Suppose  for  a  moment  that  our  initial  approximation  v  satisfies 


i|uA  -  v|l£2(v)  <  <5  (6.28) 

for  some  5  >  0.  We  shall  show  in  a  moment  how  to  compute  an  r  such  that 

||r  -  (Aav  -  PAf)||^2(v)  <  ^-  (6.29) 

Given  such  an  r,  we  define 

vG=  V  —  ar 

Since  by  (6.23),  (6.26)  and  (6.29),  ||v'  —  v'||^2(v) 
conclude  that 

llw  -  v'||^2(v)  <  ||ua  -  v'|l£2(v)  +  ||v'  -  v'||^2(v)  <  (1  -  ^)<5  +  (6-31) 

with 

9:=l-±,  (6.32) 

The  vector  v'  is  our  numerical  computation  of  one  step  of  the  gradient  algorithm  with 

a  given  initial  approximation  v  and  error  estimate  <5.  Notice  that  (6.31)  gives  an  error 
estimate  which  allows  us  to  iterate  this  algorithm.  For  example,  at  the  next  iteration,  we 
would  replace  v  by  v',  and  5  hy  65. 

We  next  discuss  how  we  shall  compute  an  approximation  r  to  the  internal  residual 
which  will  satisfy  (6.29).  For  this,  we  shall  use  a  variant  of  the  routine  NRESIDUAL 
from  §6.5,  in  which  we  shall  confine  all  vectors  to  be  supported  in  A.  We  shall  denote 
this  new  subroutine  by  INRESIDUAL.  It  is  obtained  by  replacing  f  by  PaI  in  the 
NCOARSE  step  and  A  by  Aa  in  the  APPLY  A  step. 


(6.30) 

=  a||r  -  (Aav  -  PAf)|1^2(v)  <  we 


INRESIDUAL  [v.  A,  f,  r/i,  r/2]  ^  r 

(i)  APPLY  AA[v,r/i]  ^w; 

(ii)  NCOARSE  [PAf,r/2]  ^  g. 


43 


(iii)  Set  r  :=  g  —  w. 


Here  APPLY  Aa  means  that  A  is  replaced  by  Aa  in  the  fast  matrix  vector  mul¬ 
tiplication.  From  Properties  6.2  and  6.4  we  know  that  the  output  r  of  [v,  A,  f,  r/i,  r/2] 
satishes 


i|r  -  (Aav  -  PAf)|1^2(v)  <Vi  +  V2  (6.33) 


Thus  the  choice 


Vi  =  V2 


Cl  (5 


(6.34) 


suffices  to  ensure  the  validity  of  (6.29). 

Obviously  the  number  of  iterations  needed  to  guarantee  a  target  accuracy  77  of  the 
approximate  Galerkin  solution  depends  on  the  error  bound  <5  of  the  initial  approximation 
V  of  ua.  In  fact,  the  number  K  of  iterations  necessary  to  reach  this  accuracy  is  bounded 
by 


K  <  K{S,r])  := 


log 


V 


/  log0 


+  1. 


(6.35) 


While  the  above  analysis  gives  an  upper  bound  for  the  number  of  iterations  we  shall 
need  to  achieve  our  target  accuracy,  it  will  also  be  important  for  our  analysis  to  note 
that  this  target  accuracy  may  be  reached  before  this  number  of  iterations  if  the  currently 
computed  approximaton  r  to  the  internal  residual  is  small  enough.  The  following  remark 
(which  follows  from  (6.33))  makes  this  statement  more  precise. 


Remark  6.6  If  we  choose  rji  =  7/2  :=  Citj/Q,  where  t]  is  the  target  accuracy,  and  if  r  is 
the  corresponding  output  0/ INRESIDUAL  [v.  A,  f,  771, 7/2],  we  then  have 


I|ua  -  v||72(v)  <  Cl  ir||£2(v)  +  h/3, 

so  that 

I|ua  -  v||72(v)  <  h  */  l|r||£2(v)  <  2ci7//3. 
Note  that  conversely,  since  we  also  have  by  (6.33) 


r||7'2(V)  <  C177/3  -f  ||AaV  -  PAf||72(V), 


(6.36) 

(6.37) 


we  are  ensured  that 

||r||72(v)  <  2ci7//3  if  ||ua  -  v||£2(v)  <  cic7^7//3  =  7//3k,  (6.38) 

i.e.,  the  criterion  will  be  met  when  the  exact  internal  residual  is  small  enough. 

Proof:  To  prove  (6.36),  we  write 

Aa(ua  -  v)  =  (AaUa  -  PAf)  -  (Aav  -  PAf)  =  Aav  -  PaI  -  r  +  r. 

Since  (2.22)  and  (2.25)  imply  ||v  —  ua||72(v)  A  c7^||Aa(v  —  ua)||72(v);  (6.36)  follows. 
Clearly,  (6.36)  implies  (6.37).  The  rest  of  the  claim  follows  from  (2.22).  □ 
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After  these  considereations,  we  are  now  in  a  position  to  give  our  numerical  algorithm 
for  computing  Galerkin  approximations.  Given  a  set  A,  an  initial  approximation  v  to  ua, 
an  estimate  ||v  —  ua||^2(v)  <  ^  and  a  target  accuracy  rj,  with  0  <  rj  <  S,  the  approximate 
Galerkin  solver  is  defined  by  the  following: 


GALERKIN  [A,  v,  5,  r]]  ua: 

(i)  App/?/ INRESIDUAL  [v,  A,  f,  ^  r.  If  min  {d(5,  chirH^^CV)  +  r//3|  <  7],  de¬ 

fine  the  output  Ua  to  he  v  and  STOP,  else  go  to  (ii). 

(ii)  Set 


vG=  V  —  ar. 

Since  rj  <  S,  we  know  that  ||u  —  v'||£2(v)  <  ~  ’'^ll^2(v)-  Replace  v  by  v',  S  by  65 

and  go  to  (i). 

The  relevant  properties  of  GALERKIN  can  be  summarized  as  follows. 


Properties  6.7  Given  as  input  a  set  A,  an  initial  approximations  to  the  exact  Galerkin 
solution  Ua  which  is  supported  on  A,  an  initial  error  estimate  5  for  ||ua  —  v||£2(v)  cind  a 
target  accuracy  rj,  the  routine  GALERKIN  produces  an  approximation  ua  to  ua  which 
is  supported  on  A  and  satisfies 


||ua  -  ua||^2(v)  <  7-  (6-39) 

Moreover,  if  K  is  the  number  of  modified  gradient  iterations  which  have  been  used  in 
GALERKIN  to  produce  ua,  one  also  has 


||ua  -  ua||^2(v)  <  ^^(5  (6.40) 

with  6  defined  by  (6.32).  Gonseguently,  the  number  of  iterations  K  is  always  bounded  by 


K  <  K{5,p)  = 


1  ^ 
logj 


/  log0 


(6.41) 


Moreover,  z/u  G  if(V),  with  r  =  (s  +  1/2)  ^  and  0  <  s  <  s*,  then  the  following  are 
true: 


(i)  The  output  ua  of  GALERKIN  [A,  v,  5,  r]]  satisfies 

||ua||^»(v)  <  C{K)  (||v||^»(v)  +  ||u||^»(v))  ,  (6.42) 

where  the  constant  C{K)  depends  only  on  the  number  of  iterations  K . 

(ii)  The  number  of  arithmetic  operations  used  in  GALERKIN  [A,  v,  <5,  t)]  is  less 
than 

c(K)  (iiv|i;£y,  +  iiuii;£;v))  r''‘ + ca'(#a), 

where  the  constant  C{K)  depends  only  on  the  number  of  iterations  K.  The  number 
o/ sorts  does  not  exceed  K{ff  A)  \og{ff  A). 


45 


Proof:  The  first  part  of  the  assertion  has  been  already  established  in  the  course  of  the 
preceding  discussion.  In  particular,  the  bound  on  the  maximal  number  K  of  iterations 
clearly  follows  from  (6.35). 

As  for  property  (i),  we  simply  remark  that  (iv)  in  Properties  (6.5)  of  NRESIDUAL 
also  applies  in  the  case  of  the  modified  procedure  INRESIDUAL,  so  that  after  one 
modified  gradient  iteration  we  have 

||v  ||£»(v)  <  Umax  {||v||^„(^),  ||u||£»(v)}  . 

The  assertion  (i)  follows  therefore  by  iterating  this  argument:  denoting  by  the  current 
approximation  after  k  iterations,  we  obtain  that  ||v^||£™(v)  <  U(/c)(||v||^™(v)  +  ||u||^»(v)). 

To  estimate  the  number  of  arithmetic  operations  in  this  algorithm,  we  can  use  the 
bound  on  the  number  of  operations  for  NRESIDUAL  ((ii)  in  Properties  (6.5)),  which 
also  applies  to  INRESIDUAL.  According  to  this  property,  at  the  k-th  iteration,  the 
application  of  INRESIDUAL  to  requires  at  most  C  +  l|u||£™^(v))  V  + 

arithmetic  operations.  We  add  each  of  these  estimates  for  operation  count  over 
/c  =  0, 1, . . . ,  A  and  use  the  estimate  on  ||v^||^»(v)  to  obtain  the  estimate  in  (ii). 

Finally,  at  each  iteration,  the  number  of  sorts  is  clearly  bounded  by  ^A.\og{^h), 
which  implies  the  bound  in  K^A.\og{^h)  for  the  global  procedure.  □ 

The  possible  growth  of  the  constants  C{K)  in  (6.42)  shows  the  importance  of  control¬ 
ling  the  number  of  iteration  K.  The  estimate  (6.41)  expresses  that  this  is  feasible  if  the 
initial  accuracy  bound  <5  is  within  a  fixed  factor  of  the  desired  target  accuracy  r]  in  each 
application  of  GALERKIN.  In  the  setting  of  Algorithm  III  below  this  will  indeed  be 
the  case. 


7  Numerical  Realization:  The  Adaptive  Algorithm 

We  now  have  collected  all  the  ingredients  that  are  needed  to  construct  an  optimal  adaptive 
algorithm,  both  in  terms  of  memory  size  and  computational  cost.  The  purpose  of  this 
section  is  to  describe  this  algorithm  and  to  prove  its  optimality. 

7.1  General  principles  of  the  Algorithm 

Recall  from  §6.1  that  we  start  with  an  estimate  ||f||^2(v)  <  F.  Introducing  the  sequence 
of  tolerances 

£,  :=2->Fcr‘.  (7.1) 

we  see  that  Aq  :=  0  and  uaq  =  0  are  an  admissible  initialization  in  the  sense  that 

||u  —  uao|1£2(v)  <  Co- 

Algorithm  III  conceptually  parallels  the  idealized  version  Algorithm  II.  Its  core 
ingredient  is  a  routine  called  NPROG  that  associates  to  a  triplet  (ua.  A,  <5)  such  that  ua 
is  supported  in  A  and  ||ua  —  u||£2(v)  <  <5,  a  new  pair  (u^.  A)  such  that  is  supported  in 
A  and  ||u^  -  u||^2(v)  <  <5/2. 
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Iterating  this  procedure  thus  builds  a  sequence  (ua.,  Aj)j>o  with  ua-  supported  in  Kj 
such  that 

I|u-uaJ^2(v)  <  Cj.  (7.2) 

If  e  is  the  target  accuracy,  the  algorithm  thus  stops  after  J  steps  where  J  is  the  smallest 
integer  such  that  ej  <  e. 

As  in  Algorithm  II  the  routine  NPROG  itself  will  consist  of  possibly  several  appli¬ 
cations  of  a  procedure  NGROW  described  below,  which  parallels  GROW  in  Algorithm 
II,  followed  by  NCOARSE  for  exactly  the  same  reasons  that  came  up  in  §5. 

In  contrast  to  Algorithm  II,  the  selection  of  the  next  larger  index  set  done  by 
NGROW  will  have  to  be  based  on  an  approximate  residual  obtained  by  NRESID- 
UAL  rather  than  on  the  exact  one.  We  shall  also  use  the  approximate  Galerkin  solver 
dehned  by  NGALERKIN  to  derive  the  intermediate  approximations  of  the  solution  af¬ 
ter  each  growing  steps.  Thus,  the  error  reduction  in  this  growing  procedure  requires  a 
more  rehned  analysis,  involving  the  various  tolerances  in  these  procedures.  We  shall  hrst 
address  this  analysis  which  will  result  in  several  constraints  on  the  tolerance  parameters. 

7.2  The  growing  procedure 

At  the  start  of  the  growing  procedure  that  will  dehne  NPROG,  we  are  given  set  A,  an 
approximate  solution  ua  supported  on  A  and  a  known  estimate  ||u  —  ua||£2(v)  <  <5. 

We  set  A°  :=  A  and  uao  :=  ua.  The  growing  procedure  will  build  iteratively  some 
larger  sets  A^,  /c  =  0, 1,  •  •  •,  and  approximate  solutions  Ua'*)  ^ind  will  be  stopped  at  some 
K  such  that  we  are  ensured  that 


l|u-UAif||^2(v)  <  <5/10,  (7.3) 

so  that  applying  NCOARSE  [uA/f,2(5/5]  will  output  the  new  set  A  and  approximate 
solution  such  that  ||u  —  UaI|£2(v)  <  ^/2.  The  choice  <5/10  in  (7.3)  is  justihed  by  the 
Properties  6.3  of  the  thresholding  procedure:  it  ensures  the  optimality  of  the  approximate 
solution  and  the  control  of  its  7)f(V)  norm  (see  (i)  and  (ii)  in  Properties  6.3). 

As  in  Algorithm  II,  the  growing  procedure  will  ensure  a  geometric  reduction  of  the 
error  in  the  energy  norm  ||u  —  UAfc||  where  Ua^  is  the  exact  Galerkin  solution.  Although 
it  will  not  ensure  such  a  reduction  for  ||u  —  UAif||^2(v);  we  shall  still  reach  (7.3)  after  a 
controlled  number  of  steps. 

The  procedure  NGROW  generating  the  sets  A^  can  be  described  as  follows:  given  a 
set  A  and  an  approximation  ua  supported  on  A,  we  compute  an  approximate  residual  r 
and  select  the  new  set  A  D  A  as  small  as  possible  such  that 

l|PA/Ar|lfe(v)  >  7l|r||^2(v),  (7.4) 

for  some  hxed  7  in  (0, 1].  This  can  be  done  by  taking  A  :=  A  U  A^  where 

(A^  PAcr)  =  NCOARSE  [r,  -  72||r|l,,(v)]. 
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This  procedure  can  thus  be  summarized  as  follows. 


NGROW  [A,  UA,  6, 6,  f,  7]  ^  (A,  r) 

Given  an  initial  approximation  ua  to  the  Galerkin  solution  ua  supported  on  A  the  proce¬ 
dure  NGROW  consists  of  the  following  steps: 

(i)  Apply  NRESIDU AL  [ua,  A,  f ,  6]  ^  (AA  r) . 

(ii)  Apply  NCOARSE  [r,  vT^^7^i|r||^2(v)]  — ^  (A^^,  Pa^i")  and  define  A  :=  A  U  A^. 

It  is  interesting  to  note  that  we  allow  the  situation  where  7  =  1,  in  which  case  we 
simply  have  A  =  A  U  A’’.  This  was  not  possible  with  GROW  in  Algorithm  II  since  A 
could  then  be  the  full  infinite  set  V. 

Properties  7.1  The  residual  computed  by  NGROW  satisfies  the  estimate 

Ik  ~  '^a||^2(V)  <  Cl  +  ^2  +  C2||ua  —  Ua||^2(V)-  (7.5) 

If  u  E  with  r  =  (s  +  1/2)“^  and  0  <  s  <  s*,  then  the  following  are  true: 

(i)  The  cardinality  of  the  output  A  0/ NGROW  can  he  bounded  by 

#(A)  <  7^(A)  +  GC  ^  (l|hA||^m(v)  +  ll^ll^™(v))  ’  (7-6) 

where  C  :=  min{Ci,C2}- 

(ii)  The  number  0/ arithmetic  operations  used  in  NGROW  is  less  than 

M(o  :=  c  +  iiu|i;£7,)  +  #(a))  ,  (7.7) 

(iii)  The  number  0/ sorts  does  not  excede  CM[ff)  logM(C). 

Proof:  The  first  part  of  the  assertion  follows  from  (6.19).  The  claims  (i),  (ii)  and  (iii) 
follow  from  (i),  (ii)  and  (iii)  in  the  Properties  6.5  of  NRESIDUAL.  □ 

In  our  growing  procedure,  the  tolerance  parameters  Ci  and  C2  will  be  related  to  the 
initial  accuracy  <5  by  Ci  =  and  C2  =  0.2^  where  qi  and  q2  are  fixed  parameters  that 
we  shall  specify  below  through  our  analysis.  Similarly,  we  shall  always  set  the  tolerance 
parameter  in  the  applications  of  NGALERKIN  in  such  a  way  that  the  approximate 
solutions  UAfc  will  always  satisfy 


||UAfc  -  UAfc||^2(V)  <  g3(5/c2,  (7.8) 

where  q^  is  another  parameter  to  be  specified  later  and  UAfc  is  the  exact  Galerkin  solution. 

Note  that  (7.8)  is  not  ensured  for  k  =  0,  so  that  the  very  first  step  of  our  growing 
procedure  should  be  to  replace  uao  by  the  output  of  NGALERKIN  [A°,  uao,  <5,  q3S/c2]. 
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The  growing  procedure  will  then  proceed  as  follows:  for  /c  >  0,  we  shall  define  as 
the  first  output  of  NGROW  u^fc-i,  qiS,  q2S,  f,  7].  We  then  define  u^/c  as  the  output 
of  NGALERKIN  [A^,  u^fc-i,  go<5,  (73(5/c2],  with  the  constant  go  still  to  be  specified.  It 
follows  that  (7.8)  will  automatically  be  satisfied  by  (6.39). 

Regarding  the  parameter  go,  we  need  to  choose  its  value  so  that  at  each  iteration  we 
have 

||u  -  UAfc||£2(v)  <  (7.9) 

because  we  are  using  uaj.  as  the  input  for  the  next  application  of  NGALERKIN.  Now, 
for  each  k  >  0,  we  have 

l|u  —  UAfc||^2(V)  <  ||u  —  UAfc||£2(V)  +  ||UAfc  —  UAfc||£2(V) 

<  Ci^^'^WvL  -  UAfc||  +  g3(5/c2 

<  -  uao||  +  g3(5/c2 

<  /t^A||u  -  Uao||^2(V)  +  g3<5/c2 

<  (k^A  +  q^/c2)5, 

where  we  have  used  the  monotonicity  of  the  error  ||u  —  UAfc||  as  the  sets  A^  are  growing. 
Hence,  we  see  that  we  can  take  go  :=  +  g3/c2.  With  this  choice  of  go  and  with  any 

fixed  choice  of  go,  (6.41)  the  Properties  6.7  shows  that  the  number  of  iterations  within 
each  application  of  NGALERKIN  is  uniformly  bounded  independently  of  k  and  <5. 
Note  also  that  in  terms  of  the  parameters  gi,g2,g3,  from  (7.5)  and  (7.8)  we  deduce 

||r^  —  rAfc||£2(v)  <  {qi  +  q2  +  qs)^,  (7-10) 

where  is  the  second  output  of  NGROW  [A^,  UAfc,  gi5,  g25,  f,  7]- 

In  order  to  analyze  the  error  reduction  in  our  growing  procedure,  we  shall  need  to 
relate  the  property  (7.4)  that  defines  NGROW  with  the  property  (4.8)  which  is  known 
to  ensure  a  fixed  reduction  of  the  error  ||u  —  ua||.  Using  our  error  estimate  (7.10)  we 
obtain 


PAfc+irAjl£2(V) 


>  ||PAfc+ir^||^2(v)  -  ||PAfc+i(r^  -  rAfc)llfe(v) 

>  7l|r1|^2(v)  -  ||r^ -rAfc||£2(v) 

>  7lkA'=ll^2(V)  -  (1  +  7)lk^  -  rAfcllfelv) 

>  7l|rAfc||^2(v)  -  (1  +  7)(gi  +  q2  +  qs)^. 


(7.11) 


Of  course,  we  wish  to  ensure  that  our  above  choice  of  the  expanded  set  which  was 
based  on  the  approximate  residual  does  capture  also  a  sufficient  bulk  of  the  true  residual 
V/^k.  This  can  be  indeed  inferred  from  the  above  estimate  provided  that  the  perturbation 
on  the  right  hand  side  is  small  compared  with  the  first  summand.  If  this  is  not  the  case 
the  choice  of  the  parameters  g*  should  ensure  that  the  residual  itself  and  hence  the  error 
is  already  small  enough.  The  following  observation  describes  this  in  more  detail. 
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Remark  7.2  Given  any  q4  >  0,  suppose  that  the  parameters  qi,q2,q3  are  chosen  small 
enough  that 


Qs  ,  2(l  +  7)(gi  +  g2  +  g3)\  ^ 

—  + -  <  g4. 

,C2  7Cl  J 


Then,  for  constructed  from  as  explained  above,  one  either  has 


(7.12) 


||U  -  UAfci|£2(V)  <  ^4^, 


(7.13) 


or 

where 


||u  -  UAi+l||  <  9||u-  UAk|l, 


(7.14) 


(7.15) 


Proof:  let  q  :=  (l  +  7)(gi  +  q2  +  (73).  We  distinguish  two  cases.  If  7||rAfc|k2(v)  <  2g(5,  then 
by  (2.22)  we  have  7Ci||u  —  UAfc||£2(v)  <  Combining  this  with  the  estimate  (7.8),  we 
obtain 

||u  —  UAfc||^2(V)  <  (—  H - <5, 

\C2  ICiJ 

which,  in  view  of  (7.12)  proves  (7.13).  Alternatively,  when  7||rA'' Il£2(v)  >  2g(5,  we  infer 
from  (7.11)  that 

'1 

||PAfc+irAfc||^2(v)  >  2  (7-16) 

which  is  the  desired  prerequisite  for  error  reduction  in  the  energy  norm.  In  fact,  we  can 
invoke  Lemma  4.1  to  conclude  that  (7.14)  holds  for  9  defined  in  (7.15).  □ 


It  remains  to  adjust  the  various  parameters  qi,q2,q3-  To  this  end,  one  should  keep 
in  mind  that  the  growing  procedure  aims  to  achieve  the  accuracy  in  (7.3)  after  a  finite 
number  of  steps  K. 

In  view  of  Remark  7.2,  a  first  natural  choice  seems  to  be  ^4  =  1/10  since  the  occurrence 
of  case  one  in  Remark  7.2  would  then  imply  (7.3).  However,  with  such  a  choice,  it  could 
still  happen  that  at  the  Ath  stage  of  the  growing  procedure,  case  one  comes  up  but  is  not 
discovered  by  any  error  control.  In  this  case,  we  will  need  to  make  sure  that  subsequent 
steps  still  satisfy  (7.3).  For  k  >  i,  we  have 

||u  —  UAfc||£2(V)  <  ||u  —  UAfc||£2(V)  +  ||UAfc  —  UAfc||^2(V) 

<  ^=||u  -  UAfc||  +  g3(5/c2  <  ^||u  -  UAi||  +  g3<5/c2 

VCi  yW 

<  (qaVk  +  S, 

where  we  have  again  made  standard  use  of  (2.21),  the  best  approximation  property  of 
Galerkin  solutions  and  (7.8).  Thus  our  first  requirement  is 

(^g4V^  +  ^)  <  1/10.  (7.17) 
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We  next  have  to  make  sure  that  if  case  one  never  occurs  a  uniformly  bounded  finite 
number  of  steps  suffices  to  reach  (7.3).  In  fact,  we  infer  from  (7.14)  that 


l|u  —  UAfc||£2(V)  <  ||u  —  UAfc||£2(V)  +  II^Afe  —  UAfc||^2(V) 

< 


1  6^ 

"u  -  UAfc||  +  g3(5/c2  <  ^=||u  -  uao||  +  g3(5/c2 


< 


C2 


(7.18) 


Thus  our  second  requirement  in  order  to  achieve  (7.3)  is  that  for  sufficiently  large  K, 

<  1/10,  (7.19) 


but  this  is  always  implied  by  our  first  requirement  (7.17)  for  K  sufficiently  large  but  fixed. 

Finally,  we  wish  to  install  intermediate  error  controls  to  avoid  unnecessarily  many 
steps  in  the  above  growing  procedure.  To  this  end,  we  write 


U  -  UAfc  =  U  -  UAfc  +  UAfc  -  UAfc. 
and  deduce  from  (7.8)  that  at  any  intermediate  stage 

Q  S 

||u  -  UAfc||£2(v)  <  (||ri£2(v)  +  ||r^  -  rAj|fe(v))  +  ^ 

Therefore,  using  (7.10),  we  find 

||u  -  UAfc||£2(v)  <  (ik^lkaiv)  +  {qi  +  q2  +  q3)s)  +  —  (7.20) 

Thus,  imposing  the  requirement 

{qi  +  q2  +  q3  + n~^q3)  <  (7.21) 

we  can  stop  the  iteration  if  the  following  test  of  the  current  approximate  residual  is 
answered  affirmatively 

l|i-‘llfe(v)  <  (7.22) 

Choice  of  parameters:  In  summary,  possible  choices  for  these  parameters  are  limited 
by  (7.12),  (7.17)  and  (7.21).  A  simple  possibility  is  to  take 


q4  ■  = 


1 

20fi; 


(7.23) 


Then  choose  qi  =  q2  =  q3  such  that  (7.12),  (7.17)  and  (7.21)  hold.  We  then  define 

qo  :=  +  ^3/02. 

Thus  one  finally  sees  from  (7.18)  that  the  maximal  number  of  steps  needed  to  achieve 
(7.3)  is  bounded  by 


K  ;=  K{n,d) 


log  20fi; 

I  log  d\ 


(7.24) 
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7.3  Description  of  the  algorithm 

We  are  now  in  a  position  to  describe  the  main  step  NPROG  in  our  algorithm.  We  fix  a 
value  of  7  with  0  <  7  <  1  and  we  choose  parameters  go,  <?i,  <?2,  gs,  g4  as  in  the  Choice  of 
parameters  of  the  previous  subsection.  We  fix  these  values.  The  analysis  of  the  previous 
subsection  shows  that  a  uniformly  bounded  finite  number  of  applications  of  NGROW 
suffices  to  reduce  the  initial  error  by  the  desired  amount.  The  NPROG  can  thus  be 
summarized  as  follows. 

NPROG  [A,  V,  5,  f]  ^  (A,  V,  f ) 

Given  a  set  A,  an  approximation  v  to  the  exact  solution  u  of  (4.1)  whose  support  is 
contained  in  A  and  such  that  ||v  —  u||£2(v)  A  <5,  the  procedure  NPROG  consists  of  the 
following  steps: 

(i)  Apply  GALERKIN  [A,  v,  <5,  g35/c2]  — ^  ua.  Set  A°  :=  A,  uao  :=  ua,  k  :=  0. 

(ii)  Tpp/y  NGROW  [A^,  Ua/c,  gi(5,  g2(5,  f,  7]  — ^  (A^+^,r^). 

(iii)  If  ||r^||£2(v)  <  Cl (5/20  or  k  =  K  defined  in  (7.24)  go  to  (iv),  otherwise  apply 
GALERKIN  [A^+^,  UAfc,  go(5,  g3(5/c2]  — ^  UAfc+i.  Replace  k  hy  k  +  1,  by  A^+^, 
UAfc  by  UAfc+i  and  go  to  (ii). 

(iv)  Apply  NCOARSE  [uAfc,  2(5/5]  — ^  (A,  v),  set  r  :=  r^  and  STOP. 

The  relevant  properties  of  NPROG  can  be  summarized  as  follows. 

Properties  7.3  The  output  v  0/ NPROG  satisfies 

||u-v||,2(v)  <<5/2.  (7.25) 

Moreover,  z/u  e  if(V),  with  r  =  (s  +  1/2)“^  and  0  <  s  <  s*,  then  the  following  are 
true: 

(i)  One  has  the  bound 

||v||£»(v)  <  C'||u||£»(v),  (7.26) 

and  the  cardinality  of  A  is  bounded  by 

#(A)  <  Cr‘''*||u||;{7.  (7.27) 

(ii)  The  cardinality  of  all  intermediate  sets  A^  produced  by  NGROW  can  be  bounded 
by 

#(A)  +  cr‘/-  (i|v|i;/7,  +  i|u|i;'7) .  (7.28) 

(iii)  The  number  of  arithmetic  operations  used  in  NPROG  [A,  v,  S,  f]  is  less  than 

G  -.=  C  +  ||u||;i7,)  +  #(A))  .  (7.29) 

The  number  0/ sorts  does  not  excede  CGlogG. 
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Proof:  By  our  choice  of  parameters  qi^i  =  1,2,3, 4,  Remark  7.2  and  the  subsequent 
discussion  show  that  after  at  most  K  steps,  K  given  by  (7.24),  the  reduction  (7.3)  is 
achieved.  The  estimate  (7.25)  is  then  an  immediate  consequence  of  (6.6)  and  (6.7)  in 
Properties  6.3.  Moreover,  when  u  G  ^^"(V),  with  r  =  (s  +  1/2)“^,  then  (i)  is  a  a  direct 
consequence  of  (ii)  and  (iii)  in  Properties  6.3.  By  a  repeated  application  of  (6.42)  in 
Properties  6.7  we  conclude  that 

l|uAfc||^»(v)  <  C  (||v|l^™(v)  +  ||u|l^»(v))  .  (7.30) 

We  have  used  here  that  only  a  uniformly  bounded  number  of  applications  of  NGROW 
and  GALERKIN  is  used  in  NPROG.  Combining  (7.30)  with  (7.6)  in  Properties  7.1 
yields  the  estimate  (7.28)  in  (ii). 

Note  that  the  same  is  true  for  the  possibly  somewhat  larger  sets  A  U  generated  in 
NGROW,  since  we  accept  the  case  7  =  1.  The  remaining  assertion  (iii)  is  also  obtained 
by  combining  (7.30)  with  (7.7)  in  Properties  7.1.  □ 

We  are  now  prepared  to  describe 

Algorithm  III 

(i)  Initialization:  Let  e  >  he  the  target  accuracy.  Set  A  :=  0,  v  =  0  and  S  :=  F,  where 

F  is  defined  at  the  beginning  of  this  section.  Select  the  parameters  qo,qi,q2,q3,qA 
according  to  the  above  Choice  of  Parameters  and  fix  these  parameters. 

(ii)  If  5  <  e,  accept  u(e)  :=  v,  A(e)  :=  A  as  the  final  solution  and  STOP.  Otherwise, 

apply  NPROG  [A,  v,  S,  f]  — ^  (A,  v,  f ) . 

(iii)  If  l|r|l  ^2(v)  +  (a  +  I2  +  (1  +  ^  ^)q'3)<5  <  Cie  accept  u(e)  ;=  UAfc,  A(e)  =  A^  as  the 
solution,  where  UAfc,  A(e)  =  A^  are  the  last  outputs  o/NGROW  in  NPROG  before 
thresholding.  Otherwise,  replace  S  by  5/2,  v  by  v  and  A  by  A  and  go  to  (ii). 

Remark  7.4  ITe  see  that  the  finest  accuracy  needed  on  the  data  f  is  2q2e  in  the  last 
application  0/ NPROG  so  that  we  can  start  with  an  estimate  f  with  rj  =  2q2e  in  (6.2). 

Remark  7.5  The  proper  choice  of  (q'o,  O'!,  Q's,  Q'4)  is  meant  to  ensure  the  convergence 
of  Algorithm  III,  as  well  as  the  control  of  the  operation  count  in  each  application  of 
NPROG.  This,  in  turn,  allows  us  to  prove  the  optimality  of  this  algorithm,  as  shown 
below.  Roughly  speaking,  convergence  is  ensured  if  these  parameters  are  sufficiently  small, 
but  choosing  them  too  small  typically  increases  the  constants  that  enter  the  optimality 
analysis  (e.g.  the  number  of  iterations  needed  in  GALERKIN  or  APPLY  A),  so  that 
a  proper  tuning  should  really  be  effective  in  practice.  In  particular,  it  might  be  that  our 
requirements  in  the  Choice  of  Parameters  are  too  pessimistic  and  that  the  algorithm 
still  works  with  larger  tolerances. 
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7.4  The  main  Result 

The  convergence  properties  of  Algorithm  III  can  be  snmmarized  as  follows. 

Theorem  7.6  Assume  that  A  G  Bg,  with  0  <  s  <  s* ,  and  that  A  is  an  isomorphism  on 
f'2(V)  and  suppose  that  the  assumptions  (Nl)— (N3)  are  satisfied.  Let  u  be  the  solution  of 
(2.17),  that  is,  Au  =  f.  Then  for  any  e  >  0  and  any  f  G  f'2(V),  Algorithm  III  produces 
an  approximation  u  =  u(e)  with  N  =  N{e)  :=  #snppu(e)  <  oo  satisfying 

l|u-u(e)||£2(v)  <  e.  (7.31) 

Moreover,  Algorithm  III  is  optimal  in  the  following  sense.  Ifu  G  if(V),  r  =  (s+l/2)“^ 
for  some  0  <  s  <  s*,  then  N{e)  <  C'e“^/'^||u||£™(v)  and  the  computation  of  u{e)  reguires  at 
most  CN[e)  arithmetic  operations  and  at  most  CN{e)  log  A(e)  sorts,  where  the  constants 
C  are  independent  off  and  e. 

Proof:  Let  ej  :=  2~^F,  j  =  0, 1, . . ..  Let  k  be  the  smallest  integer  snch  that  <  e.  The 
algorithm  shnts  down  when  at  an  iteration  j  either  (i)  ||r-^|1^2(v)  +  (Q'i+'?2  +  (l  +  ^~^<?3)  <  Cie 
or  (ii)  S  <  e.  In  the  first  case,  (7.31)  is  satisfied  becanse  of  (7.20).  When  case  (i)  is  not  met 
for  any  j  =  0, . . .  ,k,  then  (7.25)  in  Properties  7.3  shows  that  Algorithm  III  prodnces 
a  seqnence  (Aj,UAj)  snch  that  ||u  —  ua^- <  ^j,  where  ej  =  2~^F.  Hence  the  desired 
target  accnracy  e  is  reached  when  j  =  k.  In  either  case,  the  algorithm  will  need  at  most 
k  steps  to  reach  (7.31). 

As  for  the  complexity  analysis,  if  u  G  f')i’(V),  r  =  (s  +  1/2)^^  for  some  0  <  s  <  s*, 
we  conclnde  from  (7.26)  that  ||ua  |l£™(v)  <  C'||u||^»(v)  for  all  j  and  from  (7.27)  that 
#(A,-)  <  CG,  with  G,  := 

Thns,  on  acconnt  of  (ii)  and  (iii)  in  Properties  7.3,  the  nnmber  of  arithmetic  operations 
and  the  nnmber  of  sorts  at  the  jth  stage  of  the  algorithm  can  be  bonnded  respectively 
by  G  Gj  and  G  log  Gj.  The  assertion  now  follows  by  snmming  these  estimates  over 
j  =  0,...,/c.  □ 

We  conclnde  with  briefly  snmmarizing  the  conseqnences  of  the  above  theorem  with 
regard  to  the  original  operator  eqnation  (2.5). 

Corollary  7.7  Assume  that  A  :  ^  is  an  isomorphism  and  let  u  denote  the  exact 

solution  of  Au  =  f  for  some  f  G  Suppose  that  A  and  the  wavelet  bases  T,  T  satisfy 

assumptions  (A1)-(A3)  SO  that,  in  particular,  the  preconditioned  wavelet  representation 
A  of  A  belongs  to  Let  s*  :=  min  —  1}.  Then  for  any  f  G  H~^  and  every 

e  >  0,  Algorithm  III  produces  a  seguence  u(e)  =  {MA}AeA(e)  such  that 

\\u-  <  e- 

AeA(e) 

Moreover,  if  for  some  0  <  s  <  s*  and  r  :=  (s  +  1/2)“^  the  solution  u  belongs  to  the 
Besov  space  then  the  number  N{e)  :=  #A(e)  is  bounded  by 

At  most  G  N{e)  arithmetic  operations  and  G  N[e)  log  N[e)  sorts  are  needed  for  the  com¬ 
putation  0fUA{e)- 
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